**Sustainable Management, Wertschöpfung und Effizienz**

# Accounting and Statistical Analyses for Sustainable Development Claudia Lemke

Multiple Perspectives and Information-Theoretic Complexity Reduction

### **Sustainable Management, Wertschopfung und Effizienz** ¨

#### **Series Editors**

Gregor Weber, Breunigweiler, Germany Markus Bodemann, Warburg, Germany René Schmidpeter, Köln, Germany

In dieser Schriftenreihe stehen insbesondere empirische und praxisnahe Studien zu nachhaltigem Wirtschaften und Effizienz im Mittelpunkt. Energie-, Umwelt-, Nachhaltigkeits-, CSR-, Innovations-, Risiko- und integrierte Managementsysteme sind nur einige Beispiele, die Sie hier wiederfinden. Ein besonderer Fokus liegt dabei auf dem Nutzen, den solche Systeme für die Anwendung in der Praxis bieten, um zu helfen die globalen Nachhaltigkeitsziele (SDGs) umzusetzen. Publiziert werden nationale und internationale wissenschaftliche Arbeiten.

#### **Reihenherausgeber:**

Dr. Gregor Weber, ecoistics.institute Dr. Markus Bodemann Prof. Dr. René Schmidpeter, Center for Advanced Sustainable Management, Cologne Business School

This series is focusing on empirical and practical research in the fields of sustainable management and efficiency. Management systems in the context of energy, environment, sustainability, CSR, innovation, risk as well as integrated management systems are just a few examples which can be found here. A special focus is on the value such systems can offer for the application in practice supporting the implementation of the global sustainable development goals, the SDGs. National and international scientific publications are published (English and German).

#### **Series Editors:**

Dr. Gregor Weber, ecoistics.institute Dr. Markus Bodemann Prof. Dr. René Schmidpeter, Center for Advanced Sustainable Management, Cologne Business School

More information about this series at http://www.springer.com/series/15909

Claudia Lemke

# Accounting and Statistical Analyses for Sustainable Development

Multiple Perspectives and Information-Theoretic Complexity Reduction

Claudia Lemke Berlin, Germany

Dissertation Technische Universität Berlin, 2020

ISSN 2523-8620 ISSN 2523-8639 (electronic) Sustainable Management, Wertschöpfung und Effizienz ISBN 978-3-658-33245-7 ISBN 978-3-658-33246-4 (eBook) https://doi.org/10.1007/978-3-658-33246-4

© The Editor(s) (if applicable) and The Author(s). This book is an open access publication. 2021 **Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Planung/Lektorat: Carina Reibold

This Springer Gabler imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH part of Springer Nature.

The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany

### **Preface**

Claudia Lemke's dissertation addresses the aim to develop a sustainable development indicator set that


To meet this objective, Claudia Lemke derives a profound conceptual framework of sustainable development. Theoretical principles for the assessment of contributions to sustainable development are outlined and an overview of assessment methodologies is provided. Because the thesis identifies indicator sets and composite indicators (i.e. indices) derived from them as an expedient method to meet conceptual requirements and assessment principles, the methodology of a novel index, the Multilevel Sustainable Development Index (MLSDI), is derived subsequently.

Weighting and aggregation are crucial steps in index construction. In terms of weighting, the thesis identifies statistical procedures as expedient to yield the most promising results, because they are able to account for the correlations of underlying variables from the environmental, economic, and social domains. Three specific techniques are identified and tested against each other: Principal Component Analysis (PCA), Partial Triadic Analysis (PTA), and the information-theoretic Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm. For aggregation purposes, geometric aggregation is identified as the only method that accounts for non-comparable and ratio-scaled indicators.

The methodology is applied to a sample of the German economy for the years 2008 to 2016 in the empirical part of the dissertation. A comparable assessment of different branches is performed within each of the three domains and the aggregated MLSDI is derived for selected branches of the German economy.

This work has far-reaching implications for research and practice. With regards to sustainable development research, major contributions include the inclusion of the multilevel perspective. A wide range of indicators from all three domains of sustainable development are integrated and the analysis of their interconnections is performed in the statistical procedure of the innovative MRMRB algorithm. The thesis further uses open source data and makes all methodological choices transparent. Its Implications for practice include the support of policy-level decisions, because a methodologically sound and comparable tool is proposed to assess the sustainability performances of different units of account. The MLSDI is further proposed as an alternative to the Gross Domestic Product (GDP) as a measure of societal wellbeing at the policy level, because economic growth is limited and the additional dimensions of environmental protection and social development need to be considered when assessing societal wellbeing.

Claudia Lemke's dissertation therefore represents an important contribution to the research field of how a comparable evaluation of sustainability performances of units of different size can be performed. The results are equally important for science and practice. I wish Claudia Lemke's work the attention it certainly deserves.

Berlin, July 2020 JProf. Dr. Karola Bastini

### **Foreword**

After submitting her dissertation to Technische Universit¨at Berlin, Claudia Lemke joined the Beiersdorf AG as a Supply Chain Sustainability Manager. Since 1882, the name Beiersdorf stands for innovative skin care. We continuously develop our products and brands to win consumers' loyalty and trust through best-in-class quality. Nowadays, quality and trust do not only refer to the use phase of a product, but the consumers of today – and even more the consumers of tomorrow – demand products with a reduced environmental impact as well as an increased value for society. Innovative value creation goes beyond improving the consumer's experience of product application. Sustainable production and consumption are one of the great challenges of the 21st century, and especially global corporations have to take on the responsibility to contribute to societal wellbeing by taking the entire value chain and life cycle of their products into account. Beiersdorf meets the needs of these increased demands and has publicly pledged to improve its environmental footprint and social impact at global level.

Beiersdorf quantifies its sustainable development performances according to the Global Reporting Initiative (GRI) and allocates its contributions to the Sustainable Development Goals (SDGs). These two guiding frameworks are the foundation of Claudia Lemke's dissertation. By aligning the corporate GRI framework and the societal SDG framework at indicator level, Claudia Lemke enables the measurement of corporate contributions to societal sustainable development. Moreover, by developing a methodologically sound sustainable development index from this newly aligned indicator base, Claudia Lemke facilitates benchmarking throughout all aspects of sustainable development. Benchmarking in turn facilitates decision making in modernday corporations, often dealing with several competing priorities.

By co-funding the open access publication of Claudia Lemke's dissertation, Beiersdorf supports the public accessibility of this excellent theoretical and methodological research. Knowledge and education should not be exclusive, but inclusiveness is part of sustainable development and Beiersdorf's vision. We are proud to care beyond skin.

Hamburg, November 2020 Jean-Fran¸cois Pascal

Vice President Sustainability Beiersdorf AG

### **Acknowledgement**

The present dissertation was developed during my occupation as a (senior) research associate at the economic research institute WifOR and later in the Field of Sustainability Accounting and Management Control at the Technische Universit¨at Berlin under the supervision of JProf. Dr. Karola Bastini. This dissertation is submitted to acquire the academic degree of Doctor of Business and Economic Sciences (Dr. rer. oec.) at the Technische Universit¨at Berlin. Parts of the dissertation are published in Lemke and Bastini (2020). I state my deepest recognition to everyone who has supported me during my time as a doctoral student.

First, I am grateful to JProf. Dr. Karola Bastini for her supervision and far-reaching feedback. Her eager willingness and engaged passion for scientific debates contributed considerably to the successful completion of my dissertation project. I also thank Prof. Dr. Maik Lachmann, Chair of Accounting and Management Control at the Technische Universit¨at Berlin, for being the secondary referee of my dissertation.

I am grateful to Prof. Dr. Dennis A. Ostwald for supporting my dissertation project during my tenure at WifOR with his stimulating visions and encouraging leadership. I am also thankful for fruitful methodological debates with Dr. Marcus Cramer. I thank Rita Bergmann for her secondary authorships of the first two working papers of my dissertation project as well as her strengthening joy and ease in life. I appreciate the permission to include data on the German health economy by Jochen Puth-Weissenfels, Federal Ministry for Economic Affairs and Energy (BMWi).

Furthermore, I thank Fares Getzin for exchanging valuable thoughts and mutual motivations on progresses of our dissertations throughout my time at the Technische Universit¨at Berlin. I also appreciate the fruitful debates and the motivating moments with all other colleagues at the Technische Universit¨at Berlin and WifOR.

Last but foremost, I thank my partner Alexander Andor for never-ending encouragement, tolerance, and patience in both good and bad times of my dissertation. I am also thankful to my friend Cordula Klaus for her long-lasting support and cheering spirits. I am grateful to my parents Soon Boon and Bernd Lemke as well as my sister Susanne Lemke for providing a network of safety throughout all ups and downs of my entire academic career.

Berlin, February 2020 Claudia Lemke

To Clea and all future generations to come

The publication of this work was funded by the Open Access Publication Fund of Technische Universit¨at Berlin and the Beiersdorf AG.

### **Table of contents**





### **List of abbreviations**





### **List of figures**




### **List of tables**




### **List of equations**



### **List of symbols**




### **Chapter 1**

### **Introduction**

"The world has enough for everyone's need, but not enough for everyone's greed." Mohandas K. Gandhi

#### **1.1 Background and motivation**

The Atlantic hurricane season terminated for this term with category-5 hurricanes such as Dorian (National Weather Service, 2019). Because of climate change, intense and damaging hurricanes are three times more frequent nowadays than 100 years ago (Grinsted, Ditlevsen & Hesselbjerg, 2019; McGrath, 2019). Likewise, scientific evidence suggests that climate change made Europe's major heatwave in 2018 more than twice as likely to occur (Schiermeier, 2018; World Weather Attribution, 2018). Less dominant in public but at higher and more alarming risk than climate change is the genetic biodiversity of the biosphere (Steffen et al., 2015). Extinction rates may be 100 to 1,000 times higher than corresponding natural background rates (Ceballos et al., 2015; de Vos, Joppa, Gittleman, Stephens & Pimm, 2015). These examples demonstrate the abandonment of the Holocene and the entering of the Anthropocene, a new geological era that is characterised by threatening human activities towards fundamental Earth system dynamics (e.g. Griggs et al., 2013; Rockstr¨om et al., 2009b; Sachs, 2012). In addition to that, humanitarian crises persist. The number of people living in extreme poverty is declining, but projections estimate that 479 million people will remain in extreme poverty in 2030 (Roser & Ortiz-Ospina, 2019) – 479 million people too many.

Sustainable development and sustainability consist of three contentual domains: environmental protection, social development, and economic prosperity. Today's and tomorrow's human needs should be satisfied subject to respecting present and future environmental limits (Holden, Linnerud & Banister, 2017; WCED, 1987). Economic prosperity serves this purpose (UNCED, 1992). Traditionally, the satisfaction of needs is enabled by economic growth at the expense of the environment and social justice (A. B. Atkinson, 2015; Holden et al., 2017; Piketty, 2014). Decoupling the nexus of economic

<sup>©</sup> The Author(s) 2021

C. Lemke, *Accounting and Statistical Analyses for Sustainable Development*, Sustainable Management, Wertschöpfung und Effizienz, https://doi.org/10.1007/978-3-658-33246-4\_1

growth and environmental degradation or social deprivation is a current challenge for decision makers (Holden, Linnerud & Banister, 2014). Human-nature interactions in a complex socio-ecological system (Clark, van Kerkhoft, Lebel & Gallop´ın, 2016; Hall, Feldpausch-Parker, Peterson, Stephens & Wilson, 2017; WCED, 1987) are studied in sustainability science, with the objective to develop a solution-oriented agenda (Kates, 2015) for sustainable development and sustainability. Generally, sustainable development and sustainability are characterised by complexity, which might be held liable for our unsustainable world. From an economic theory perspective, unsustainable outcomes are present due to market failures. Environmental and social externalities are not internalised (Patterson, McDonald & Hardy, 2017; Sala, Ciuffo & Nijkamp, 2015), and governmental regulation is demanded for correction. At the moment, sustainable development and sustainability are visions of future (White, 2013), and the goal is to turn the sustainable future into the present as soon as possible. Pursuing this goal is widely referred to be the major and the most difficult challenge of today's society (van Poeck, Læssøe & Block, 2017).

To take up the challenge of making our world environmentally and socially sustainable, measurement and assessment of sustainable development performances are inevitable. Only what is measured can be managed (e.g. Parris & Kates, 2003). Indicator sets are central for sustainable development measurement because they are able to capture complexity: Indicator sets can cover a wide range of aspects of the three contentual domains (Alm´assy & Pint´er, 2018), multiple objects of investigations, large time series, and diverse geographical regions. Including an index or a composite measure in an indicator set yields further advantages. An index is a compressed description of a multidimensional state (Ebert & Welsch, 2004) and hence reduces complexity (Bell & Morse, 2018). The important focus in measurement is recaptured (Griggs et al., 2014), combating the disadvantage of a rich indicator set to potentially cause more confusion than understanding (Wu & Wu, 2012). Several scholars even argue that a sustainable development index is necessarily required because such complexity cannot be mapped by standalone indicators (Alm´assy & Pint´er, 2018; Costanza, Fioramonti & Kubiszewski, 2016; Hanley, Moffatt, Faichney & Wilson, 1999; Nardo et al., 2008; Ramos & Moreno Pires, 2013). Moreover, sustainable development indices have the potential to replace the Gross Domestic Product (GDP) as a measure of societal wellbeing (Costanza, Fioramonti & Kubiszewski, 2016; Costanza et al., 2014). GDP has been heavily criticised for being an insufficient measure of wellbeing because it only quantifies the size of an economy in terms of final goods and services (Costanza et al., 2014; Giannetti, Agostinho, Villas Bˆoas de Almeida & Huisingh, 2015; van den Bergh, 2009). In contrast, sustainable development indices are metrics that fulfil the ambitions of measures of wellbeing as they comprehensively describe environmental, social, and economic aspects. A further major advantage of sustainable development indices is their capability to explore interactions of individual sustainable development elements

(Costanza, Fioramonti & Kubiszewski, 2016; T. Hahn & Figge, 2011). Knowledge about these interactions are prerequisites for the effectiveness of coordinated actions and thus for maximising progress on sustainable development (Costanza, Fioramonti & Kubiszewski, 2016; ICSU & ISSC, 2015; Spaiser, Ranganathan, Swain & Sumpter, 2017; Weitz, Carlsen, Nilsson & Sk˚anberg, 2018).

Several weaknesses and gaps are present in the field of sustainable development indicators and indices, which motivate this research. First, conceptual frameworks of sustainable development lack multiple perspectives (e.g. Baumgartner, 2014; Boron & Murray, 2004; Chofreh & Goni, 2017; Griggs et al., 2014; Maletiˇc, Maletiˇc, Dahlgaard, Dahlgaard-Park & Gomiˇsˇcek, 2014), such that previous sustainable development indicators and indices can only be applied to economic objects of the same aggregational size. However, a comparable multilevel assessment of economic objects of any aggregational size is crucial because sustainable development is a society level concept (T. Hahn, Pinkse, Preuss & Figge, 2015; Jennings & Zandbergen, 1995), and effects on the planet (macro level) are the cumulative results of individuals (micro level) (Dahl, 2012). Sustainable development and sustainability can only be achieved if micro and meso objects contribute (Griggs et al., 2014; Sachs, 2012). A positive side effect of this mandatory requirement of multilevel comparability is the provision of objective macro-economic benchmarks that prevent meso-economic objects such as corporations from greenwashing their sustainable development performances. The micro-to-macro connection is seen as the major challenge that scholars from business and economics face (McGregor & Pouw, 2017). The management literature calls for a meso-to-macro connection in order to stop missing the "big picture" (Whiteman, Walker & Perego, 2013). To the best of the author's knowledge, multilevel indicators and indices that address this perspective gap by being comparably applicable to micro (individuals), meso (organisations such as corporations), and macro objects (conglomerates of organisations such as industries or overall economies) are absent in the academic literature. This work is motivated by this call and will make significant contributions to this challenge. Second, sustainable development and sustainability is mostly integrated at operational tiers while lacking strategic and normative tiers (Baumgartner & Rauter, 2017; Tseng, Lim & Wu, 2018). This operational-to-normative gap is a further reason for deficiencies in the progress towards sustainability. The conceptual part of this work will address the operational-to-normative gap. Third, a knowledge gap on interactions of individual sustainable development elements is present (see above), and generating insights about synergies and trade-offs of individual sustainable development elements is a subject of current research (e.g. Allen, Metternicht & Wiedmann, 2019; Nilsson, Griggs & Visback, 2016; Pradhan, Costa, Rybski, Lucht & Kropp, 2017; Spaiser et al., 2017; Weitz et al., 2018). This work is motivated by the knowledge gap and will contribute new methodological and empirical understandings. Fourth, bottlenecks in the science-practice linkage persist (Agyeman, 2005; Christie & Warburton, 2001; Hall et al., 2017; Sala, Farioli & Zamagni, 2013), further harming the progress towards sustainability. The empirical part of this work will contribute to this knowledge-to-action or sustainability gap. Fifth and last, previous sustainable development indices such as the Dow Jones Sustainability Indices (DJSI) (e.g. RobecoSAM, 2018a), Composite Sustainable Development Index (ICSD) (Krajnc & Glaviˇc, 2005), Sustainable Development Goal Index (SDGI) (e.g. Schmidt-Traub, Kroll, Teksoz, Durand-Delacre & Sachs, 2017a), or the Sustainable Society Index (SSI) (e.g. van de Kerk, Manuel & Kleinjans, 2014) feature methodological shortcomings, such that decisions based on these metrics may be misled (B¨ohringer & Jochem, 2007; Mayer, 2008). This study is motivated by making methodological contributions to the (sustainable development) index literature.

The following section, Section 1.2, explains how the present work will take up these challenges and fill the five identified research gaps, setting the research question and aim of this dissertation.

#### **1.2 Research question and aim of the dissertation**

Against this background, the present dissertation aspires to contribute to the science and practice community to accelerate progress in sustainable development. In doing so, it addresses the call that sustainable development demands performance measurement by an indicator set that includes a composite measure to replace GDP as a measure of wellbeing (see Section 1.1; e.g. Costanza, Fioramonti & Kubiszewski, 2016). It further acknowledges that multiple perspectives must be comparably captured (see Section 1.1; e.g. Dahl, 2012) in a methodologically sound manner to avoid misled decision making (see Section 1.1; e.g. B¨ohringer & Jochem, 2007). As multilevel sustainable development indices are not represented in the literature (see Section 1.1), the aim of the dissertation is to develop a sustainable development indicator set that includes a composite measure, with the following features: First, the indicator set should include environmental, social, and economic indicators as well as a composite measure; second, it should be applicable to multiple levels meaningfully; and third, it should be constructed in a methodologically sound manner. The newly derived index will be called the "Multilevel Sustainable Development Index (MLSDI)". Because of the multilevel applicability, the MLSDI will be able to support taking up the challenge of managing decoupling economic growth and environmental degradation or social deprivation (see Section 1.1; Holden et al., 2014) at corporate, industry, and national levels.

This work will draw on prior research and will contribute to existing studies. First, Rotmans, Kemp and van Asselt's (2001) multilevel perspective is incorporated in the conceptual framework to tackle the perspective gap. Sustainable development indicators and indices will be identified as the most suitable multilevel assessment method, and a multilevel indicator set will be contributed. Second, the conceptual framework is amplified by the St. Gallen management model (Ulrich, 2001) for decision

making at operational, strategic, and normative tiers. Third, this work will address the knowledge gap and contribute insights about interconnections of individual sustainable development elements. These interconnections will be investigated by three different, sophisticated weighting methods from the fields of multivariate statistics and information theory. The three weighting methods will be compared against each other, and the methods' sensitivities will be analysed. This procedure enhances previous studies in several ways: Compared to indices that apply equal weighting (e.g. the SDGI; Schmidt-Traub et al., 2017a), interconnections are studied; by contrast with indices that rely on expert elicitation (e.g. the ICSD; Krajnc & Glaviˇc, 2005), objectivity, which is a critical sustainable development assessment principle (Sala et al., 2015), is ensured; in comparison with indices that do not study sensitivities, transparency and robustness, which are further central assessment principles (e.g. Pint´er, Hardi, Martinuzzi & Hall, 2018; Sala et al., 2015), are improved. Fourth, this work contributes to the sustainability gap by delivering a sustainable development index that can be re-built and re-used, given the full transparency in its methodology, data sources, and empirical findings. The present work will contribute 44 sustainable development indicators of the environmental, social, and the economic domains that originate in an alignment of the meso Global Reporting Initiative (GRI) and the macro Sustainable Development Goal (SDG) frameworks (GRI, 2016; UN, 2018), three subindices for each contentual domain and an overall index, the MLSDI. The sample consists of 62 industries and five aggregated branches (Eurostat, 2008b), including the crosssectional health economy (Gerlach, Legler & Ostwald, 2018), in the German economy from 2008 to 2016. Thereby, this study contributes objective benchmarks that may prevent greenwashing (see Section 1.1). The application is expected to be more useful than previous indices because a wider, multilevel scope of decisions can be covered: management decisions, national industry policy, and international affairs. Fifth and last, this work will contribute profound methodological knowledge to the (sustainable development) index literature. Methodological shortcomings of existing sustainable development indices will be highlighted by a systematical evaluation based on sustainable development assessment principles. The MLSDI will overcome these deficits by profound methodological research. It will further contribute to the (sustainable development) index literature by making use of methods from further disciplines that are neither common in sustainability science nor in business statistics yet. Identified lacks of previous sustainable development indices will involve insufficient data cleaning, weighting of the indicators, and aggregation into the composite measures as well as a lack in sensitivity analyses. The MLSDI is further expected to be more accurate for decision making because of its overall methodological soundness.

The next section, Section 1.3, outlines the procedure of this dissertation.

#### **1.3 Procedure**

To investigate and tackle the research gaps as presented in Section 1.2, this work is structured as follows. The next chapter, Chapter 2, will derive a conceptual framework of sustainable development. Definitions of sustainable development and sustainability will be reviewed and adopted for this work. The conceptual framework will provide a guiding structure throughout the remainder of this dissertation. It will consist of six dimensions, thereof two major ones that require detailed examinations. First, the three contentual domains of sustainable development – environmental protection, social development, and economic prosperity – will be explored and integrated into the framework. The contentual domains will constitute the topics and aspects of sustainable development that are aimed to be mapped quantitatively. Second, the three major change agent groups of sustainable development – business, policy, and science – will be examined. The change agent group business will form the objects of investigation.

Chapter 3 will focus on measurement and assessment methods of sustainable development. Sustainable development measurement and assessment principles will be reviewed and harmonised in order to systematically evaluate diverse measurement methods and previous indices. An overview on sustainable development assessment methods will be given and the most suitable method for comprehensive multilevel sustainable development assessment will be determined. Previous meso and macro indices of sustainable development will be analysed.

In Chapter 4, profound methodological research on sustainable development index construction will be accomplished. First, an overview on the calculation steps will be given, and the assessment principles and further criteria will be allocated to the calculation steps they are relevant to. A systematic assessment of the reviewed indices' methodological approaches by means of the assessment principles and further criteria will follow. Last, the methodology for the new sustainable development index – the MLSDI – will be researched and explained.

In Chapter 5, the MLSDI will be applied to a sample of 62 industries as well as five aggregated branches, including the cross-sectional health economy, in the German economy from 2008 to 2016. The empirical findings will be described and analysed. This chapter will be structured according to the calculation steps of a sustainable development index.

The dissertation will terminate with a discussion of the research results and an overall summary and conclusion (see Chapter 6).

#### 1.3. Procedure 7

Open Access This chapter is licensed under the terms of the Creative Commons At tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Chapter 2**

## **Conceptual framework of sustainable development**

In this chapter, a conceptual framework of sustainable development is elaborated by an extensive literature research. Along with this, the first four research gaps are uncovered. Jabareen (2009) defines a "conceptual framework as a network [...] of interlinked concepts that together provide a comprehensive understanding of a phenomenon or phenomena". Therefore, a conceptual framework is a result of a theorisation, and it is required to understand soft facts and enable interpretations (Jabareen, 2009). Furthermore, it helps to navigate complexity (Pope, Bond, Hug´e & Morrison-Saunders, 2017) and thereby supports decision makers during the implementation phase of sustainable development (Chofreh & Goni, 2017).

Among existing sustainable development frameworks (e.g. Baumgartner, 2014; Boron & Murray, 2004; Chofreh & Goni, 2017; Griggs et al., 2014; Maletiˇc et al., 2014), comprehensive approaches are rare, and there is a lack of conflation of various aspects. Hence, a synthesis and integration of multiple sustainable development dimensions is accomplished in this chapter. Established fragments are adopted, and novel elements are added.

Constructing the conceptual framework, this chapter is structured as follows. Section 2.1 discusses distinct definitions of sustainable development and sustainability and adopts one for the remainder of this work. The underlying concepts of the three contentual domains of sustainable development – environmental protection (see Section 2.2.1), social development (see Section 2.2.2), and economic prosperity (see Section 2.2.3) – as well as their linkages (see Section 2.2.4) are presented in Section 2.2. Stakeholders and change agents of sustainable development are introduced in Section 2.3. Multilevel perspectives are present (see Section 2.3.1), and the change agent groups business, policy, and science are debated in Section 2.3.2 to Section 2.3.4. The chapter ends with a summary (see Section 2.4).

### **2.1 Definition of sustainable development and sustainability**

The modern debate on sustainable development is led by the United Nations (UN), who has held world summits for more than 40 years and released the most elaborated concept of sustainable development (Lock & Seele, 2017). The start of their global agenda for a change was the United Nations Conference on the Human Environment (UNCHE), which took place in Stockholm in 1972. In this conference, the foundation of the concept of sustainable development was clarified as the alignment of human development and the planet's environmental limits (Kates, 2015; UNCHE, 1972). 26 principles on the capacity of the Earth, social as well as economic development for a favourable living, and an action plan with 69 recommendations were worked out (UNCHE, 1972). Further elaborating on the concept of sustainable development, the World Commission on Environment and Development (WCED), also known as the Brundtland Commission, defined sustainable development as a development "that meets the needs of the present without compromising the ability of future generations to meet their needs" (WCED, 1987). To this day, the definition is contemporary and even referred to as an "ethical standard" (Baumgartner, 2014). Centrepiece of this definition is the intergenerational justice (Jerneck et al., 2011) of today's and tomorrow's generation regarding two concepts: needs and limits (WCED, 1987). Intergenerational justice spans the first dimension of the sustainable development space: the temporal horizon. The second dimension of sustainable development deals with intragenerational justice of the two concepts. The United Nations Conference on Environment and Development (UNCED) subdivided this second dimension into three contentual domains: environmental protection (given the concept of limits), social development (given the concept of needs), and economic prosperity (UNCED, 1992).<sup>1</sup> These first two dimensions are visualised in Figure 2.1. In spite of the splitting into the three contentual domains, each of them is not a separate crisis, but they are interdependent and mutually reinforcing, requiring a simultaneous and integrated consideration (see Section 2.2.4; WSSD, 2002). Furthermore, sustainable development is a collective responsibility at local, national, regional,<sup>2</sup> and global levels (WSSD, 2002). This notion constitutes the third sustainable development dimension, the geographical region, depicted in Figure 2.2.

Despite the fact that the UN's approach to sustainable development and sustainab-

<sup>1</sup>Some authors, e.g. Jesinghaus (2018), interpret the Agenda 21 to subdivide sustainable development into four domains: environment, society, economy, and institutions (UNCED, 1992). As institutions deal with the three contentual domains, a separation at the same level is not systematic, and is thus not adopted in this work. Confirming this view, the SDG 17, "Partnerships for the goals", does not clearly span its own, institutional domain (see Figure 2.12b).

<sup>2</sup>The term "regional" may also refer to an area smaller than the national level (e.g. Ramos & Caeiro, 2010). However, the WSSD's (2002) classification is adopted in this work.

**Figure 2.1** The first two dimensions of the sustainable development space (based on Witjes et al., 2017; with friendly permission of c 2017 The Authors)

ility now represents a global consensus (Costanza, Fioramonti & Kubiszewski, 2016; Vermeulen, 2018), both terms are controversially discussed in the academic literature. On the one hand, scholars such as T. Hahn et al. (2015); Lozano (2008); Sala et al. (2013); Shaker (2015); and Reid (1997) are in line with the UN's approach, interpreting sustainable development not as a steady state but as a journey or a process of change, adaption, and learning. Contrasting, sustainability is the ideal, dynamic state to achieve. In this case, the pathway of sustainable development ought to be pursued in order to obtain the long-term goal of sustainability (Dragicevic, 2018). On the other hand, authors such as Clark et al. (2016); Holden et al. (2014); and Waas et al. (2014) use both terms interchangeably. Further scholars such as P. James, Magee, Scerri and Steger (2015) argue vice versa: Sustainability is the capacity to persist over time, and therefore, it is a process to achieve the goal sustainable development (Dragicevic, 2018). An overview of different approaches to sustainable development can be found in, e.g. Hopwood, Mellor and O'Brien (2005). Arising from the numerous existing definitions, other works intend to capture the terminology by generating a tag cloud of commonly-used elements in peer-review-published definitions (White, 2013). This approach might be questionable because, for example, in highly subjective areas such as the social domain of sustainable development (see Section 2.2.2), a larger group than the science community should be consulted. However, for merely identifying the main research domains, this reflective method might be legitimate (Kajikawa, Ohno, Takeda, Matsushima & Komiyama, 2007).

**Figure 2.2** The first three dimensions of the sustainable development space (based on Witjes et al., 2017; with friendly permission of c 2017 The Authors)

The UN's approach to sustainable development is adopted for this work because it is most profound and comprehensive (Biermann, Kanie & Kim, 2017; Lock & Seele, 2017) and agreed on by world leaders, awarding it with a high degree of accordance. Sustainable development is interpreted as a process that requires change and transformation (Lock & Seele, 2017; Sala et al., 2013) to a desired development path (T. Hahn et al., 2015) in order to reach the ideal, dynamic state of sustainability (Lozano, 2008; Reid, 1997), which is a long-term goal (Shaker, 2015). If sustainable development and sustainability can be both referred to simultaneously, for brevity, the term sustainable development is preferred in the remainder of this work because sustainability has not yet been reached.

Dealing with sustainable development consists of two modes: first, a descriptiveanalytical mode that aims to understand the human-nature interaction in a complex socio-ecological system; and second, a transformational mode that addresses the societal transition required to achieve sustainability (Clark et al., 2016; Hall et al., 2017; McGreavy & Kates, 2012; Schaltegger, Beckmann & Hansen, 2013; Spangenberg, 2011; Wiek, Ness, Schweizer-Ries, Brand & Farioli, 2012). The next section, Section 2.2, sheds light on the first mode and investigates the contentual domains of sustainable development, whereas the other two, already spanned dimensions (temporal horizon and geographical region) do not require further theoretical analysis due to their straightforwardness; they are directly incorporated in the methodological and empirical part (see Chapter 4 et seq.). Subsequently, Section 2.3 addresses the second mode, the stakeholders and change agents of the transition process, expanding the three-dimensional to

a six-dimensional sustainable development space. The six-dimensional space is the final conceptual framework of sustainable development, required to adequately measure and assess sustainable development. In turn, the adequate assessment is the prerequisite for sustainable development management and its transition (see Chapter 3; e.g. Parris & Kates, 2003).

### **2.2 The three contentual domains of sustainable development**

The UNCED (1992) classified sustainable development into three contentual domains: environmental protection, social development and economic prosperity (see Section 2.1). The following sections, Section 2.2.1 to Section 2.2.3, review and analyse the academic literature of these domains. Other segmentations such as the natural capital approach by Costanza and Daly (1992), the five capital approach by Porritt (2007), or the placepermanence-persons approach by Seghezzo (2009) are not further considered because these attempts "explain the composition of the cake by cutting it into thinner [or different] slices" (Hacking & Guthrie, 2008). The last section, Section 2.2.4 integrates the three domains to a unified dimension of sustainable development.

#### **2.2.1 Environmental protection**

In the academic literature of sustainable development, the use of the terms environment and ecology is not precise (e.g. Costanza, Fioramonti and Kubiszewski, 2016; Kates, 2015; and T. Hahn et al., 2015 vs. Hall et al., 2017; and Holden et al., 2014). Ecology is defined as "the branch of biology that deals with the relations of organisms to one another and to their physical surroundings" (Oxford Dictionaries, 2018a). In contrast, the environment is defined as (1) "the surrounding or conditions in which a person, animal, or planet lives or operates", or as (2) "the natural world, as a whole or in a particular geographical area, especially being affected by human activity" (Oxford Dictionaries, 2018b). Ecology refers to the relationship between an organism and its natural environment, whereas the environment as of definition (1) is something an organism possesses (Mebratu, 1998). In the context of sustainable development, the term ecology is too narrow because only the human-nature interaction would be regarded. The first definition of the term environment is too wide since it would include, in addition to the natural environment, the economic, political, and cultural environment (Mebratu, 1998). These aspects are already assigned to the other two domains – social development (see Section 2.2.2) and economic prosperity (see Section 2.2.3). Finally, the second definition of the environment suits the sustainable development context: The natural environment itself and the human-nature interaction are referred

to simultaneously. It follows that, in this work, environmental protection is defined as the path to environmental sustainability, a state in which the natural world is not harmed nor degraded by human activity, such that needs of today's generation are met without compromising needs of tomorrow's generation.

For highly anthropocentric reasons, the natural world is pointed at: The environmental system of the Earth is intended to remain stable because it provides lifesupporting services to humans and is thus a prerequisite for thriving societies (Griggs et al., 2013; Kates, 2015; Steffen et al., 2015). Scientific insights deduced by the natural science community are in the centre of the environmental domain. The main focus is on limits or threshold values as well as interdependences of ecological and Earth system processes (Holden et al., 2017; Patterson et al., 2017; Sala et al., 2015). Especially the research group around Rockstr¨om spreads new knowledge in this field. Their concept of planetary boundaries (Rockstr¨om et al., 2009a, 2009b; Steffen et al., 2015) perfectly reflects the UN's concept of limits (see Section 2.1). Planetary boundaries are threshold values of life-supporting Earth system processes above which an unacceptable global environmental change might not be possible to be avoided. This zone is the zone of high risk. The threshold itself lies in the zone of uncertainty that features an increasing risk. Below the boundary, the zone of safe operating space for humanity is located. Core boundaries are boundaries "each of which has the potential on its own to drive the Earth system into a new state should they be substantially and persistently transgressed" (Steffen et al., 2015). Nine planetary boundaries, thereof two core boundaries (climate change and biosphere integrity), are identified. Figure 2.3 displays the nine planetary boundaries and their current statuses of exploitation.<sup>3</sup> The planetary boundaries stratospheric ozone depletion, ocean acidification, and freshwater use are currently operating in the safe zone. Climate change and land system change are in the zone of uncertainty, while the boundaries biochemical flows and the biosphere integrity's subboundary genetic diversity are in the zone of high risk. For novel entities, atmospheric aerosol loading, and the subboundary functional diversity, thresholds could not be quantified yet.

Despite the derivation from natural science, the concept of planetary boundaries draws on both objective and subjective matters. Measuring thresholds is objective, but assessing and setting the level of the boundaries is highly subjective because it implies defining the acceptable risk. Therefore, boundary setting is eventually a social decision (Griggs et al., 2014; Leach, Raworth & Rockstr¨om, 2013) that requires political decision making (see Section 2.3.3).

<sup>3</sup>Detailed descriptions of the planetary boundaries, their functioning, and role in the Earth system are not further outlined but can be found in Rockstr¨om et al. (2009b).

**Figure 2.3** Nine planetary boundaries and current statuses of exploitation (from Steffen et al., 2015; with permission of c 2015, American Association for the Advancement of Science)

#### **2.2.2 Social development**

Of the three contentual domains, the social domain of sustainable development is least developed (Missimer, Rob`ert & Broman, 2017a, 2017b). The concept remains open and contested (Bostr¨om, 2012), different meanings circulate, and there are difficulties in identifying purely social issues (Murphy, 2012). The literature is fragmented and limited (Ajmal, Khan, Hussain & Helo, 2018; Dempsey, Bramley, Power & Brown, 2011), such that a further development of this domain is required (see Section 6.3).

Murphy (2012) identifies four dimensions in the social domain of sustainable development: equity, awareness, participation, and social cohesion. Cuthill (2010) also points out four key concepts, though, slightly different: social capital, social infrastructure, social justice and equity, and engaged governance. Overviews and more detailed concepts of the social domain can be found in, e.g. Ajmal et al. (2018); Bostr¨om (2012); Missimer et al. (2017a); Missimer et al. (2017b); and Murphy (2012). Core concepts include, among others, quality of life, wellbeing, subjective welfare, happiness, life satisfaction, social inclusion, dignity, affection, freedom, and safety (Harangozo, Csutora & Kocsis, 2018; Vavik & Keitsch, 2010). These involve material as well as non-material aspects and their achievement is highly subjective and individually determined (Mc-Gregor & Pouw, 2017). Especially the former concepts rather refer to the developed world, where basic needs have been successfully addressed and higher order needs are focused (Vallance, Perkins & Dixon, 2011).<sup>4</sup> Vallance et al. (2011) subdivide the social domain into three categories: development sustainability, bridge sustainability, and

<sup>4</sup>Vallance et al. (2011) neither specify basic nor higher order needs. The concept of needs adopted in this work follows shortly.

maintenance sustainability. Development sustainability addresses basic needs, justice, and equity, whereas bridge sustainability covers the changes in behaviour to achieve environmental sustainability. Maintenance sustainability aims to preserve socio-cultural patterns. In this work, the social domain is understood as development sustainability. Bridge sustainability and the notion of changes in behaviour is the underlying process of sustainable development in general, not only a means of obtaining environmental sustainability. Furthermore, social conditions correlate with environmental protection, but this linkage is not the focal point of the social domain. Maintenance sustainability is disregarded as the preservation of socio-cultural patterns is not necessarily desired. Thus, maintenance is not an overriding principle, but it is actively and explicitly governed. Further authors agree on the notion of development sustainability by Vallance et al. (2011): In view of Ajmal et al. (2018); Holden et al. (2017); Stumpf, Baumg¨artner, Becker and Sievers-Glotzbach (2015); and Stumpf, Becker and Baumg¨artner (2016), social development is characterised by moral principles and philosophy on needs, equity, and justice. Needs are in-born requirements of humans to be physically, emotionally, and mentally healthy (Missimer et al., 2017a). Equity regards "situations in which the claimant is equally off" (Young, 1995), whereas justice is concerned with the "fair balance of mutual claims and obligations within a community" (Stumpf et al., 2015). Equality also appears frequently in the context of social development and deals with equal considerations as a claim holder or equal shares in distribution (Stumpf et al., 2015). Because equity and equality are principles of justice (Stumpf et al., 2015; Stumpf et al., 2016; Young, 1995), they become obsolete in working out the overarching concepts of the social domain. The guiding principle is justice on its own, supporting the concept of needs. Satisfaction of needs must be fairly balanced across regions (intragenerational justice) and time (intergenerational justice) (Dower, 2004; Stumpf et al., 2015). A definition of social development might therefore read: Social development is the path to social sustainability, a state in which human needs of today's generation are satisfied in a just manner without compromising the human needs of tomorrow's generation.

Because the core of the social domain are human needs (see Section 2.1), concepts of human needs ought to be adduced in theorising this domain. The most well-known concept of human needs is the hierarchy of needs by Maslow (1943).<sup>5</sup> He points out that humans are motivated by in-born needs that are ordered hierarchically and can be visualised in a pyramid (see Figure 2.4). At the bottom of the pyramid are needs that humans first seek to satisfy. After their satisfaction, needs from a higher layer are desired to be met, until the top of the pyramid is reached. Physiological needs at the bottom consist of homeostasis and appetite needs. Safety needs include, among others, the need for security, protection, freedom of fear and chaos, as well as structure and law. Belongingness and love needs are the third step on the hierarchy of needs

<sup>5</sup>Other works on human needs include, e.g. Max-Neef, Elizalde and Hopenhayn (1991), but are not further examined.

**Figure 2.4** Maslow's hierarchy of needs and the principle of justice (Maslow, 1943, 1987)

and refer to relations with other people to get and receive affection. Esteem needs can be categorised into two parts: first, self-esteem such as the desire for strength, achievement, competence, and confidence; and second, esteem of others such as desire for reputation, fame, recognition, attention, and dignity. The last stage consists of needs for self-actualisation, which Maslow (1987) described as the "desire to become [...] what one idiosyncratically is". In other words, humans desire self-fulfilment and seek to become actualised in what they potentially are (Maslow, 1943, 1987).<sup>6</sup> The principle of justice is applicable to every hierarchy level: justice among physiological needs at the bottom and justice among needs to self-actualisation at the top.

The concept of social boundaries is designed in analogy to the concept of planetary boundaries. Social boundaries represent thresholds above which basic conditions are met and below which critical human deprivations occur (Raworth, 2012, 2017). These boundaries count water, food, health, education, income and work, peace and justice, political voice, social equity, gender equality, housing, networks, and energy (see Figure 2.5). Water, for example, is measured as the "population without access to improved drinking water [and sanitation]", or food quantifies the "population undernourished" (Raworth, 2017). The setting of the threshold values and current statuses of achievement as of Raworth (2017) are also displayed in Figure 2.5.<sup>7</sup> Although referencing to the UN's approach, in particular the SDGs (see Section 2.3.3), Raworth's social boundaries are mainly applicable to the developing world, which is not in line with the UN suggesting a universally applicable approach. A merger of Maslow's hierarchy of needs, which includes needs of the developed and the developing world, with Raworth's concept of social boundaries yields a valuable conceptual framework of the social domain of sustainable development. In this connection, Maslow's hierarchy is dissolved to a

<sup>6</sup>Maslow (1972) added self-transcendence at the top of the pyramid. However, since he did not include it in his work in 1987, it is also disregarded in this work.

<sup>7</sup>Worldwide data set; in the majority of cases one year of calculation between 2008 and 2015.

**Figure 2.5** 12 social boundaries and current statuses of achievement (from Raworth, 2017; with friendly permission of c The Author)

circle of boundaries. The dissolution is legitimate because the hierarchy might not be significant, but needs might be independent of each other (Tay & Diener, 2011). An illustrative example is an artist not having satisfied all material needs but being rich in terms of self-actualisation.

#### **2.2.3 Economic prosperity**

Economic growth or profits are often incorporated in the economic domain. However, neither economic growth nor profits are key to sustainable development, nor are they required for a broader conception of it (Jackson, 2009; McGregor & Pouw, 2017; Vermeulen, 2018). Even happiness does not necessarily require economic growth. Empirical evidence suggests diminishing marginal happiness in the course of a rising GDP per capita (p.c.) (Jackson, 2009). The misconception of economic growth or profits being key to sustainable development can be traced back to Elkington (1997) and the triple bottom line of people, planet, profit (Vermeulen, 2018).<sup>8</sup> This misconception is carried forward, and only 8% of reviewed corporate sustainable development literature negatively invoke the term triple bottom line (Isil & Hernke, 2017). Economic prosperity is the third contentual domain of sustainable development, and economic growth is only needed in places where human needs are not met in order to bring people out of poverty (Holden et al., 2014, 2017; McGregor & Pouw, 2017; WCED, 1987). In other words, the production of resources is only required to maintain a reasonable standard of living (Bansal, 2002). Prosperity is defined as the state of being successful in material and financial terms (Oxford Dictionaries, 2018c, 2018d). In contrast, Jackson (2009) does

<sup>8</sup>Elkington (2018) himself requested to revise his framework of the triple bottom line. It was not designed to be an accounting tool that balances financial, environmental, and social aspects, but it intended to induce reflections about capitalism and its future.


**Table 2.1** Overview of (post-)growth literature streams

not define prosperity based on only material success, but prosperity further includes social and psychological aspects. However, as these aspects are already subsumed in the social domain (see Section 2.2.2), economic prosperity in this work follows the Oxford Dictionaries' definition: Economic prosperity is the path to economic sustainability, a state in which material and financial success is achieved, such that today's environmental limits and social (or human) needs are met without compromising future generations' limits and needs.

The effect of economic growth on sustainable development is ambiguous. On the one hand, economic growth might contribute to sustainable development because first, it might induce technological advancement required to mitigate environmental degradation (Holden et al., 2017; Stern, 2015; van den Bergh, 2011), and second, it might lift people out of poverty, improve social welfare, and satisfy human needs. On the other hand, economic growth might harm sustainable development as it typically entails environmental damages and might reduce social equality (A. B. Atkinson, 2015; Holden et al., 2017; Piketty, 2014) and justice. Because of this ambiguity, various streams of (post-)growth literature have emerged. These are presented in Table 2.1. Degrowth, negative growth, zero growth, steady state, positive growth, and green growth economies

are disregarded by definition since the concept of sustainable development purports that economic growth is merely a means to an end. In contrast, an a-growth economy and a green economy comply with this notion: Economic growth is not a driving force, but human needs and environmental limits are centred.

Economic growth can be understood in terms of GDP, employment, consumption, production and further measures (EC, IMF, OECD, UN & World Bank, 2009). The most widely used economic performance measurement is the GDP, which is defined as the "monetary market value of all final goods and services produced in a country" (Giannetti et al., 2015; van den Bergh, 2009). GDP receives severe criticism for its construction and its use, while its founder, Kuznets (1934a, 1934b), was well aware of its shortcomings – or rather its pointedness. For instance, he was aware of the fact that GDP cannot measure economic welfare because the distribution of income and means of earning the income remain unknown. He even warned not to equalise GDP growth and economic or social wellbeing (Costanza, Hart, Kubiszewski, Posner & Talberth, 2018; Costanza et al., 2014; Kuznets, 1934a, 1934b). Moreover, GDP does not differentiate between desirable and undesirable activities but positively accounts all expenditures. For example, undesired clean-up costs of an oil spill lead to an increase in GDP (Cobb, Halstead & Rowe, 1995; Giannetti et al., 2015; Kubiszewski et al., 2013). GDP gives an incomplete picture by only including priced goods. Social costs such as environmental damages are known as negative externalities and remain unpriced with the result that GDP encourages the depletion of natural resources faster than their renewal rate (Costanza et al., 2018; Costanza et al., 2014; Giannetti et al., 2015; van den Bergh, 2009). Further limitations and examples can be found in, e.g. Cobb et al. (1995); Costanza et al. (2014); Giannetti et al. (2015); Kubiszewski et al. (2013); Stiglitz, Sen and Fitoussi (2009); and van den Bergh (2009). Even the argument that GDP positively correlates with wellbeing indicators such as life expectancy or literacy rate is not enough for GDP being utilised as a measure of wellbeing because a correlation does not attest causality (van den Bergh, 2009). However, GDP is not a wrong measure, but it is wrongly used (Giannetti et al., 2015; Stiglitz et al., 2009). Instead of attempting to measure welfare or progress, ending up with wrong conclusions, GDP's original purpose should be stuck to: GDP quantifies the size of an economy in monetary terms of final goods and services.

#### **2.2.4 Integration of the three contentual domains**

In the previous sections, Section 2.2.1 to Section 2.2.3, it has come to light that a strict separation of the three domains is not feasible, but the three domains are deeply interlinked (WSSD, 2002). To investigate the demanded synchronisation and coordination of the three subsystems nature, society, and economy (Bossel, 1998; Spangenberg, 2011), cross-disciplines such as environmental sociology, economic sociology

(Bostr¨om, 2012), or ecological economics (e.g. Costanza & Daly, 1992) have emerged. The interlacing is driven by the socio-economic subsystem's embeddedness in and dependence on the global biophysical system (Griggs et al., 2014; Patterson et al., 2017; Sala et al., 2015). Changes in environmental circumstances (environmental domain) have resulted in economic gains (economic domain) but not for all people (social domain) (Kates, 2015; Turner II et al., 1990). The principles of limits and needs are combined, and clear cuts between the domains are challenging. Environmental pollution that pushes people back below the social foundation (Raworth, 2012) might be interpreted as an environmental-economic or environmental-social issue. Also, environmental pollution that arises from higher living standards (typically leading to pollution at global level) or environmental pollution that originates in poverty (mostly resulting in pollution at local level (WCED, 1987)) may be classified as environmental-economic or environmentalsocial problems. This example further evokes thoughts about environmental justice, and it illustrates the ambiguous correlation of income and environmental degradation: Higher living standards but also poverty can lead to environmental degradation. However, it is certain that people only take up with environmental protection if their basic needs are met (Bansal, 2002; Vallance et al., 2011). Similarly, corporations are more likely to engage with sustainable development if they feature a strong financial performance (Campbell, 2007). A more clear-cut example of the linkage of the environmental and the social domains is the discussion whether an environmental tax should be a fixed or progressive tax. Furthermore, the social and economic domains are closely intertwined as income and prosperity brings people out of poverty, ensuring a minimum wellbeing and typically enhancing social cohesiveness (Dragicevic, 2018). Here, ambiguities are also present because economic prosperity at a macro level might reduce social equality, a setback in social development (A. B. Atkinson, 2015; Holden et al., 2017; Piketty, 2014). The relationship of the three domains are illustrated in Figure 2.6. The arrows symbolise the direction of the relationship. Environmental protection and social development are both focal points and mutually dependent, whereas economic prosperity only serves the other two domains and should be adjusted according to their requirements.

On the conceptual side of integrating the three domains, the concepts of planetary and social boundaries are combined, obeying the UN's core concepts limits and needs. The result is the so-called safe and just space for humanity or doughnut for the Anthropocene (see Figure 2.7a; Raworth, 2012, 2017). The outer boundary represents the environmental ceiling and should not be exceeded. The inner boundary expresses the social foundation and should not be deceeded. Critical natural thresholds are located above the outer boundary, and critical deprivations of human needs occur below the inner boundary. As a result, the safe and just space for humanity is located below the planetary and above the social boundaries, respectively (O'Neill, Fanning, Lamb & Steinberger, 2018; Raworth, 2012, 2017). The current status of the safe and just operating space is

**Figure 2.6** Relationship of the three contentual domains

#### displayed in Figure 2.7b.<sup>9</sup>

Within the safe and just space, a range of possible pathways that could yield sustainability can be mapped. The preferred trail is highly subjective because it is a function of, among others, cultures, visions, values, costs, risks, and distribution of power (Leach et al., 2013). The existence of a range of possible pathways makes sustainable development a deeply political topic. The role of policy and their current goal setting will be further discussed in Section 2.3.3. Moreover, the range of possible pathways implies that weak sustainability can be applied. The notion of weak sustainability originates from capital theory and assumes substitutability of the different types of capital. Natural and manufactured capital can be reduced individually as long as the overall level of capital passed to future generations remains constant or grows (Cabeza Gut´es, 1996; Figge & Hahn, 2004; Neumayer, 2010; Pearce & Atkinson, 1993; Pope et al., 2017; Sala et al., 2013). This type of sustainability is often represented in a Venn diagram (see Figure 2.8a), the most common graphical representation of sustainability (Dragicevic, 2018; Lozano, 2008; Mebratu, 1998). On the contrary, strong sustainability assumes that the different types of capital are complements and need to be preserved for future generations (Costanza & Daly, 1992; H. E. Daly, 1990; Dragicevic, 2018; Figge & Hahn, 2004; Neumayer, 2010; Sala et al., 2013). Therefore, the capital with the shortest supply is a limiting factor (H. E. Daly, 1990; Dragicevic, 2018). The graphical representation of strong sustainability is often a concentric diagram (see Figure 2.8b; Dragicevic, 2018; Griggs et al., 2013; Lozano, 2008; Mebratu, 1998), with the environmental domain on the outside and the economic domain on the inside because the socio-economic subsystem is embedded in the global biophysical system (see above; Patterson et al., 2017; Sala et al., 2015).<sup>10</sup> Strong sustainability is in line with most

<sup>9</sup>Denotations and statuses of the boundaries slightly differ from Steffen et al. (2015; see Figure 2.3). <sup>10</sup>Lozano (2008) suggests further graphical representations grounded in a critical review of the existing visualisations. Major criticism includes compartmentalisation of the linked domains and the missing representation of dynamics.

**(a)** The concept of the safe and just operating space for humanity

**(b)** Current statuses of the nine planetary and 12 social boundaries

**Figure 2.7** The safe and just operating space for humanity (based on/from Leach et al., 2013; Raworth, 2012, 2017; with friendly permissions of c ISSC, UNESCO 2013; c Oxfam International February 2012; c 2017 The Author)

ecological economists (e.g. Costanza & Daly, 1992; H. E. Daly, 2005; Holden et al., 2014; Isil & Hernke, 2017). The reasons behind are twofold. First, the anthropocentric, natural science perspective recognises that human outcomes depend on the functioning of the Earth system (O'Neill et al., 2018) and acknowledges that the limiting factor has become exactly this system (Costanza & Daly, 1992; H. E. Daly, 2005). Second, from an economic perspective, strong sustainability is required as natural and manufactured capital are often complements by their nature (Costanza & Daly, 1992). Synthesising Leach et al.'s (2013) and the ecological economists' viewpoints, weak sustainability, which is allowed within the safe and just operating space for humanity, should be accompanied by minimised substitutability to respond to both factor limitations and complementarity. However, outside the safe and just space, strong sustainability must be applied because factors of the environmental or the social domain are exhausted and thus become limiting factors. The environmental and the social boundaries must be known to determine whether weak or strong sustainability should be in use.

After dealing with the descriptive-analytical mode of sustainable development by analysing the three contentual domains and their linkages, the next section, Section 2.3, examines stakeholders and change agents of sustainable development. These are prerequisites for the second, transformational mode of sustainable development that aims to put the normative concept of sustainable development into practice (see Section 2.1; Wiek et al., 2012).


### **2.3 Stakeholders and change agents of sustainable development**

At the start of the UN's debate on sustainable development in the 1970s, the UNCHE (1972) recognised that citizens, communities, enterprises, and institutions at any level should share equitable efforts in the sustainability transition. Groups or individuals that can affect or be affected by actions are stakeholders (Freeman, 1984, 2010; H¨orisch, Freeman & Schaltegger, 2014). Change agents are defined as "internal or external actors that play a significant role in initiating, managing, or implementing change" (Caldwell, 2003; van Poeck et al., 2017). Because sustainable development requires change and transformation (see Section 2.1; e.g. Lock & Seele, 2017), it is desired that all stakeholders become change agents who devote actions, behaviour, decision making, and solutions (Hall et al., 2017) towards sustainable development. Thus, the change agent group builds the fourth dimension of the sustainable development framework and can be arranged into four clusters: business, policy, society (Hajer et al., 2015), and science (Lock & Seele, 2017).<sup>11</sup> Each group acts on every sustainable development dimension. To facilitate the visualisation of the sustainable development space, the

<sup>11</sup>Lock and Seele (2017) divide change agents into several categories: companies, governments, Intergovernmental Organisations (IGOs), private citizens, non-governmental organisations, charitable organisations or non-profit organisations, grassroot organisations, media, future generations (though, being passive stakeholders), and academia. For this work, this granularity is not required but the general structure is adopted.

**Figure 2.9** The first four dimensions of the sustainable development space

previously displayed cube is now disassembled into its six squares; each represents one sustainable development dimension. Figure 2.9 shows the visualisation of the first four dimensions of the sustainable development space: the temporal horizon, contentual domain, geographical region, and the change agent group. The fifth and sixth dimension will follow in Section 2.3.1 and Section 2.3.2.

In the following section, Section 2.3.1, the multilevel perspective is discussed. It is a framework that conflates the different change agents into one, unified framework. Hereafter, the main change agent groups business, policy, and science are examined (see Section 2.3.2 to Section 2.3.4). The group society is not further investigated as deeper insights from sociology or further disciplines are beyond the scope of this work. However, society remains an indispensable change agent group in the sustainability transition as, for instance, private citizens can influence corporations by their consumer behaviour (Kucuk & Krishnamurthy, 2007) and politics by their election decision.

#### **2.3.1 The multilevel perspective**

In sustainable development, multiple perspectives are present (Lock & Seele, 2017; Seyfang & Haxeltine, 2012) for two reasons. First, various types of stakeholders exist and have myriad demands (Perez-Batres, Miller & Pisani, 2011). Second, sustainable development, which is a society level concept (see Section 2.3.2; e.g. T. Hahn et al., 2015), requires change and transformation (Lock & Seele, 2017) at multiple scales and across all sectors (Griggs et al., 2014) because effects on the planet are the cumulative results of individuals (Dahl, 2012). Both sustainability transition frameworks – the multilevel perspective and transition management – organise these multiple perspectives into three levels: micro, meso, and macro (e.g. Geels, 2002; Kemp, 1994; K¨ohler

et al., 2019; Loorbach, 2010; Markard, Raven & Truffer, 2012; Rip & Kemp, 1998; Rotmans et al., 2001; Smith, Voß & Grin, 2010). By doing so, the big picture and the broader problem framing can be captured (Smith et al., 2010), which is in turn necessary for a successful transition to sustainability. Only if multiple actors cooperate, their actions can intensify each other, leading to a successful transition (Loorbach, 2007). On the one hand, the multilevel perspective regards technological change for sustainable development and organises the analysis into niches (micro), regimes (meso), and landscapes (macro) (e.g. Geels, 2002; Kemp, 1994; Loorbach, 2007; Rip & Kemp, 1998; Smith et al., 2010). Niche is the level of innovation inside which novelties are created, tested, and diffused. A regime is the "dominant culture, structure and practice embodied by physical and immaterial infrastructures", whereas a landscape is defined as the overall societal setting (e.g. social values, political cultures, or economic trends), in which a process of technological change occurs (Loorbach, 2007).<sup>12</sup> Given its focus on technological change, this framework is not further regarded in this work. On the other hand, the transition management framework by Rotmans et al. (2001) is of relevance for this work because it is a decision-oriented framework that sorts the aggregational size of stakeholders and change agents of sustainable development into micro, meso, and macro. A micro object comprises individuals and individual actors, a meso object is composed of networks, communities, or organisations, whereas a macro object is a conglomerate of institutions or organisations. Because this framework also addresses micro, meso, and macro levels, it is also referred to as the multilevel perspective. Every stakeholder can be divided to the three aggregational sizes. For example, business may be an individual economic agent (micro), a corporation (meso), or a branch or an overall economy (macro); policy may be a single politician (micro), a single national government (meso), or an IGO (macro); and so on ad nauseam. Figure 2.10 illustrates this novel dimension within the sustainable development space, which is disregarded in existing sustainable development frameworks (see Chapter 2; e.g. Chofreh & Goni, 2017). This perspective gap is closed by the present framework. The sixth and last dimension follows in the next section, Section 2.3.2, which deals with the change agent group business.

#### **2.3.2 Corporate sustainability**

Without dedication and leadership by corporations to sustainable development, sustainable development will not be reached (Sachs, 2012). Sustainable production and consumption are the major challenges of sustainable development (Sala et al., 2013; Weitz et al., 2018), and corporations represent the productive sources of the economy, producing and consuming resources (Bansal, 2002; T. Hahn & Figge, 2011).

<sup>12</sup>Further definitions of landscapes, regimes, and niches exist and can be found in, e.g. Geels (2002); and Rip and Kemp (1998).

**Figure 2.10** The first five dimensions of the sustainable development space

Analysing corporations with respect to sustainable development, T. Hahn and Figge (2011) developed three conceptual principles: instrumental finality, teleological integration, and practicability. First, instrumental finality is concerned with the determinateness of corporate sustainability and can be either organisational or societal (G. D. Atkinson, 2000; T. Hahn & Figge, 2011). Organisational sustainable development targets the long-term survival of the firm (G. D. Atkinson, 2000; T. Hahn & Figge, 2011), advancing financial performance by means of environmental and social issues (Dyllick & Hockerts, 2002). In other words, environmental and social issues only enter the equation to the degree of an opportunity for business success (T. Hahn & Figge, 2011). Sustainable development is seen as a source of value creation (Baumgartner, 2014; McWilliams & Siegel, 2011). To this end, corporate sustainability is defined as meeting the needs of a firm's direct and indirect stakeholders, without compromising its ability to meet the needs of future stakeholders as well (Dyllick & Hockerts, 2002). Societal sustainable development of the firm postulates corporate contributions to sustainable development at society level. The firm should only exist to the degree it contributes (G. D. Atkinson, 2000; T. Hahn & Figge, 2011). Societal instrumental finality is demanded because sustainable development is a society level concept (T. Hahn, Figge, Pinkse & Preuss, 2010; T. Hahn et al., 2015; Jennings & Zandbergen, 1995). Corporate sustainability must be about transposing the notion of sustainable development to the business level (Dyllick & Hockerts, 2002), such that corporate sustainability is conceptually linked to the Brundtland definition of sustainable development (Montiel & Delgado-Ceballos, 2014). Consequently, businesses themselves cannot become sustainable (T. Hahn et al., 2015; Jennings & Zandbergen, 1995), but their contribution at society level is haunted. The triple bottom line of people, planet, profit by Elkington (1997)

is not only a misconception in the society level concept of sustainable development but also in corporate sustainability. In the society level concept, economic prosperity and not economic growth is key to sustainable development (see Section 2.2.3; e.g. Vermeulen, 2018); for corporate sustainability, also economic prosperity and not profit is key as societal instrumental finality is required (see above). Furthermore, the defensive approach of corporate social responsibility is not enough because it only addresses corporations' responsibility to society and regards the moral obligation of managers (Bansal & Song, 2017). Only negative impacts of businesses on society are eliminated (Baumgartner, 2014; Carpenter & White, 2004), but contributions to sustainable development must be tackled by a scientific system perspective (Bansal & Song, 2017). This perspective is pursued by corporate sustainability and societal instrumental finality.

Second, teleological integration deals with the integration of environmental, social, and economic aspects (T. Hahn & Figge, 2011). This integration is seen as a major challenge in post-modern society and thus in corporate sustainability (Gladwin, Kennelly & Krause, 1995; T. Hahn & Figge, 2011; Taylor, 1989) as the interlinkages include tensions (T. Hahn et al., 2015). Tensions may arise along each sustainable development dimension visualised in Figure 2.10, forthcoming in Figure 2.11.<sup>13</sup> Four management approaches are identified that cope with tensions. The win-win perspective regards situations in which the three domains are in harmony, such that economic, social, and environmental objectives can be reached simultaneously (T. Hahn et al., 2010). The business case for sustainable development is realised (Dyllick & Hockerts, 2002; T. Hahn et al., 2010) by avoiding tensions through alignment of the three domains. This typically implies an economic bias, which is referred to as bounded instrumentality (T. Hahn & Figge, 2011; T. Hahn et al., 2010; van der Byl & Slawinski, 2015). The triple bottom line leads to bounded instrumentality. By limiting itself to profit maximisation, this perspective is likely to dismiss potential positive corporate contributions to sustainable development (T. Hahn et al., 2010). The trade-off perspective recognises that there are situations in which the three domains cannot be obtained simultaneously. Owing to the multidimensionality of sustainable development, these situations are rather the rule than the exception,<sup>14</sup> and thus, corporate sustainability is required to conceptually be able to deal with trade-offs (T. Hahn & Figge, 2011; T. Hahn et al., 2010). In this management perspective, tensions are avoided by choosing one sustainable development element over the other. Typically, profits are sought to be maximised (van der Byl & Slawinski, 2015). Thinking "beyond the business case" is required (Dyllick & Hockerts,

<sup>13</sup>According to T. Hahn et al. (2015), tensions may only arise along three dimensions: levels, process of change, and context. *Levels* refer to the aggregational size and can be individuals, organisations, or systems. This view is in line with the multilevel perspective by Rotmans et al. (2001) (see Section 2.3.1). *Process of change* regards the three contentual domains, and *context* refers to the temporal and spatial context (i.e. intergenerational and intragenerational aspects, respectively).

<sup>14</sup>Opposing, Pradhan et al. (2017) conclude in their empirical study that there are typically more synergies than trade-offs. Nonetheless, conceptual ability to deal with trade-offs remains essential because they have to be managed regardless of their relative frequency.

**Figure 2.11** The six-dimensional sustainable development space and the three conceptual principles of its management

2002; T. Hahn & Figge, 2011; T. Hahn et al., 2010, 2018; T. Hahn et al., 2015), and businesses should not have any a priori economic superiority (T. Hahn & Figge, 2011) but simultaneously address the three, interconnected sustainable development domains (T. Hahn et al., 2015). The integrative perspective requests managers to pursue different sustainable development aspects at once even if they are oppositional (T. Hahn et al., 2015). The focus is shifted from economic to environmental and social issues (van der Byl & Slawinski, 2015), and solutions for the entire system of interrelated elements are looked for (Gao & Bansal, 2013). Last, the paradox perspective explicitly acknowledges tensions (T. Hahn et al., 2018) by coexistence of oppositional elements (Clegg, Vieira da Cunha & Pina e Cunha, 2002; T. Hahn et al., 2015; Lewis, 2000). These situations are managed by first accepting the contradictions and second exploring them (van der Byl & Slawinski, 2015), such that managers are able to achieve competing objectives (T. Hahn et al., 2018). T. Hahn et al. (2018); T. Hahn et al. (2015); and van der Byl and Slawinski (2015) agree that in terms of teleological integration, the paradox perspective must be implemented. Notwithstanding, Landrum and Ohsowski (2018) find that the dominating mindset is the business case for sustainable development, which neither acknowledges the paradox theory nor tensions in general.

Third, practicability refers to the need of effectively informing and guiding decision makers (Boron & Murray, 2004; T. Hahn & Figge, 2011). These three conceptual principles do not only apply to their original field of corporate sustainability but can be transferred to the management of sustainable development in general. Therefore, they enter the conceptual framework of sustainable development (see Figure 2.11).

The three conceptual principles – societal instrumental finality, paradox teleological integration, and practicability – are urged to be embedded into all decisional tiers (Engert, Rauter & Baumgartner, 2016; Galbreath, 2009; R. Hahn, 2013), opening the sixth and last dimension of the sustainable development space. The decisional tier can be divided into three levels: normative, strategic, and operational (Baumgartner, 2014; Ulrich, 2001). The normative tier deals with the management philosophy and basic beliefs as well as values of the corporation that influence behaviours and decisions of management and employees (Baumgartner, 2014; Ulrich, 2001). The strategic tier is responsible for the effectiveness of the sustainability strategy. The process of planning, implementing, and evaluating effects is dealt with in order to achieve the long-term goals (Baumgartner, 2014; David, 2009). The operational tier is concerned with efficiency and implements normative and strategic goals (Baumgartner, 2014; Ulrich, 2001). This model is known as the St. Gallen management model (Ulrich, 2001). Similar to the conceptual principles, the decisional tiers are not only of relevance for corporate sustainability but for sustainable development management in general, entering the conceptual framework. The final version of the framework, with its six dimensions and three conceptual principles, is pictured in Figure 2.11. Despite the need to address all three decisional tiers, many corporations only integrate corporate sustainability at the operational tier (Engert et al., 2016; Galbreath, 2009; R. Hahn, 2013). This operational-to-normative gap is seen as the major reason in the lack of progress towards (corporate) contributions to sustainable development (Baumgartner & Rauter, 2017; Tseng et al., 2018) and is hence taken into consideration in the selection process of the sustainable development measurement method (see Section 3.1 to Section 3.2).

Generally, corporations need an incentive to engage in corporate contributions to sustainable development (T. Hahn & Figge, 2011; Husted & de Jesus Salazar, 2006). Incentives and drivers can be of internal or external nature (Lozano, 2015), and several theories exist to explain engagements in corporate sustainability. An overview on literature streams, their main assumptions, and example references from theorybuilding, summarising, or empirical studies are given in Table 2.2. The last column of Table 2.2 evaluates the fulfilment of the respective theory with the conceptual principles of societal instrumental finality and paradox teleological integration (see Figure 2.11). Practicability is not meaningful to be evaluated in this context but will be taken up on in Chapter 3. The natural resource-based view focuses on competitive advantage and maximisation of the firm, such that bounded instrumentality is present. The win-win or the trade-off perspective might be the managing view. Institutional, legitimacy, and stakeholder theories are driven by stakeholders and therefore may fit the criteria of instrumental finality and teleological integration if stakeholders desire or enforce these. Stewardship theory and sustaincentrism are the only theories that conceptually include societal instrumental finality and paradox teleological integration at any time. Consequently, corporations are encouraged to take actions to employ stewards and implement sustaincentrism in their organisation. Further studies on drivers of corporate sustainability include Engert et al. (2016); and Lozano (2015). Eccles, Ioannou and


Serafeim (2014) investigate vice versa and tackle the impact of corporate sustainability on organisational processes and performances.

#### **2.3.3 Political goal setting: The United Nations's (UN) Sustainable Development Goals (SDGs)**

Policy making and the involvement of governments are inherent in sustainable development (Meadowcroft, 1997, 2011). The subjective nature of sustainable development means going beyond efficiency and deciding upon one of the multiple pathways (see Section 2.2.4; Leach et al., 2013), requiring negotiations in a democratic system (Mc-Gregor & Pouw, 2017). Moreover, governments exercise control by launching laws or regulations and by providing public goods such as infrastructure (Clarkson, 1995; Hood & Margetts, 2007; Lock & Seele, 2017). IGOs frame political interactions (Meadowcroft, 2011), and in this vein, the UN has released the most elaborated concept of sustainable development (see Section 2.1; Lock & Seele, 2017). Further international organisations such as the Organisation for Economic Co-operation and Development (OECD) and International Labour Organization (ILO) spread advices on political landscapes and legal frameworks for sustainable development in documents such as ILO (2013); and OECD (2016).<sup>15</sup> However, as in previous sections, this work continues to concentrate on the UN's approach to sustainable development. Section 2.1 has dealt with the normative concept of the UN's approach, whereas this section regards the strategic level and the release of development goals.

The first development goals were the Millennium Development Goals (MDGs). The MDGs are an integrated framework adopted by 189 countries around the world in the 2000s, aiming at social development and improved living standards of the world's poor (Glaser, 2012; Griggs et al., 2014; Sachs, 2012; UNGA, 2000). With the MDGs, measurable and timebound objectives were set, promoting global awareness, political accountability, social feedback, and public pressure for sustainable development (Sachs, 2012). In 2015, the MDGs were replaced by the SDGs. The SDGs do not only embrace developing countries but are universally applicable to all countries and geographical regions (Glaser, 2012; Sachs, 2012). Given the third dimension of the sustainable development space (see Figure 2.11), an essential improvement is realised. The SDGs promote social development and economic prosperity in harmony with nature for all nations and are globally accepted as the content and meaning of sustainable development (Dahl, 2018; UNCSD, 2012; UNGA, 2015). There are 17 SDGs with 169 targets and 232 indicators in total. The goals and targets are agreed on by international negotiation, whereas the indicators are worked out and annually refined by an expert group (UN, 2018, 2019a; UNGA, 2015). With the numerous, quantitative indicators, progress can be monitored, policy may be informed, and accountability of all stakeholders can be

<sup>15</sup>The ILO focuses topics of labour and thus only regards the social or economic domain.

ensured (UN, 2019a). The SDGs are, similar to the MDGs, voluntary, time-bounded targets (Glaser, 2012) and can be summarised to poverty elimination, sustainable lifestyles for all, and a stable resilient planetary life-supporting system (Griggs et al., 2014). In detail, the 17 SDGs read (UN, 2018):


Figure 2.12a displays the 17 SDGs and Figure 2.12b shows their allocation to the three contentual domains. Four goals are assigned to the environmental domain, eight goals belong to the social domain, another four goals make up the economic domain and

**(a)** Overview of the Sustainable Development Goals (SDGs) (UN, 2019b)

**(b)** Assignment of the Sustainable Development Goals (SDGs) to the three contentual domains (from/based on Folke et al., 2016; Rockstr¨om and Sukhdev, 2014; with friendly permission of c 2016 by the authors)

**Figure 2.12** The Sustainable Development Goals (SDGs)

one goal, SDG 17 on partnership for the goals, cannot be assigned to any but affects all contentual domains.

The SDGs are critically discussed in the academic literature. On the positive side, they open the door to a unified framework of sustainable development (Griggs et al., 2014), and the level of ambition and comprehensiveness are the greatest so far in the history of political goal setting for sustainable development (Biermann et al., 2017). Similar to the MDGs, the SDGs place this goal setting at the centre of political agendas and generate worldwide commitments and actions (Glaser, 2012). The novel bottomup, non-legally-binding approach is a key success factor as, among others, moral and practical commitments feature lower transaction costs as well as fewer delays than the classical top-down approach (Biermann et al., 2017; Hajer et al., 2015; Sachs, 2012). Nonetheless, the SDGs are explicit in the endpoint and may therefore clarify pathways to necessary end outcomes (Vermeulen, 2018). The SDGs are universally applicable (Glaser, 2012; Griggs et al., 2013; Sachs, 2012), and the small number of goals as well as their simplicity are essential for focus and effectivity (Griggs et al., 2014; Sachs, 2012). Yet, the goals and targets are comprehensive (Pradhan et al., 2017). Besides, they are practicable (Sachs, 2012), measurable (Griggs et al., 2013), and science provides guidance on their framing (Glaser, 2012; Griggs et al., 2014), such that the important science-practice interlinkage is realised (see Section 2.3.4). To sum up, advocates claim major requirements of the sustainable development framework are met.

However, opponents of the SDGs do not interpret the bottom-up approach as a success factor but claim that an obligation for target fulfilment should be established. Otherwise, counterproductive drivers are supported, and only easily achievable targets might be chosen with the result that the full potential of the SDGs might be forfeited (Allen et al., 2019; Spangenberg, 2017). Furthermore, the global goals and targets must

be translated into corresponding efforts at the national level (Dahl, 2018). In addition, the SDGs are said to be vague, weak, or meaningless (Holden et al., 2017; Stokstad, 2015). 54% of the targets require further work and need to be strengthened by, for instance, determining endpoints and time frames for an accurate measurement. 17% of the targets are non-essential and can be disregarded (ICSU & ISSC, 2015; Stokstad, 2015). Spaiser et al. (2017) reinforce these qualitative assertions by empirical evidence derived by several multivariate techniques. They conclude that the economic domain is valid, the social domain is well represented, but the environmental domain is poorly defined and incoherent. Scholars generally agree that further research is demanded in the environmental domain, among others, the planetary boundaries must be linked to the SDGs and broken down to national or corporate level (see Section 2.2.1 and Section 6.3; e.g. O'Neill et al., 2018; Whiteman et al., 2013). Further criticism involves that there are repetitions and that the environmental goals 12 to 15 are not quantifiable (Holden et al., 2017).<sup>16</sup> The author of this work does not agree on this criticism as the UN (2018) lists numerous solid, quantitative indicators. Nonetheless, the author agrees on Holden et al.'s (2017) criticism that the SDGs rest on wrong premises by balancing the three dimensions. The UN (2018) includes economic growth as a sustainable development indicator but does not specify a threshold above which economic growth is not required anymore. Further criticism includes having too many goals results in not having a goal at all. Therefore, only relevant indicators should be chosen (see Section 3.1; H´ak, Janouˇskov´a & Moldan, 2016; Holden et al., 2017; Janouˇskov´a, H´ak & Moldan, 2018; Reyers, Stafford-Smith, Erb, Scholes & Selomane, 2017). Reyers et al. (2017) offer an approach to monitor the SDGs with only essential variables. Moreover, prioritisation of the SDGs is a prerequisite for effectiveness of actions. The SDGs are individually straight forward but the system as a whole, its dynamics, synergies, and trade-offs have to be understood (Allen et al., 2019; Nilsson et al., 2016; Pradhan et al., 2017; Sachs, 2012; Spaiser et al., 2017; Weitz et al., 2018). This knowledge gap must be solved for maximising progress on the SDGs (Costanza, Fioramonti & Kubiszewski, 2016; ICSU & ISSC, 2015; Spaiser et al., 2017; Weitz et al., 2018), critically determining the selection process of the sustainable development assessment method (see Section 3.1 to Section 3.2). If decision makers ignore the interlinkages and overlaps, important contributions to sustainable development may be missed. However, decision makers require science-based assistance for complexity reduction and prioritisation. First works on SDG prioritisation include, for instance, Allen et al. (2019); Pradhan et al. (2017); and Weitz et al. (2018). New insights on the system dynamics, synergies, and trade-offs will be contributed by the empirical part of this work (see Chapter 5).

To sum up, the SDGs entail both risks and opportunities: The SDGs bear the risk of

<sup>16</sup>Folke et al. (2016); and Rockstr¨om and Sukhdev (2014) assign SDG 12 on responsible consumption and production to the economic domain. The author of this work rather agrees with Holden et al. (2017) and the SDG 12 being an environmental goal (see Section 5.3.1).

creating a huge bureaucratic burden with failure of practical results, and they have the potential to transform the globe towards sustainable development. To reduce the risk of failure, the knowledge gap must be closed. This is a task for the science community (including this work), which is characterised in the next section, Section 2.3.4.

#### **2.3.4 Sustainability science**

Last, the science community is fundamental in the process of sustainable development because it crafts knowledge, facilitates the transition with the new knowledge, passes the knowledge on to young people in institutions of higher education, and publishes the information for the public (Bachmann, 2016; Barth, 2016; Clark, 2007; Clark et al., 2016; Folke et al., 2016; Lock & Seele, 2017). The discipline sustainability science was initiated by Kates et al. (2001), decades after the start of the intergovernmental debate headed by the UN (see Section 2.1 and Section 2.3.3). Kates (2015); and Kates et al. (2001) raised seven core questions to be answered by the discipline, drawing on both the descriptive-analytical and the transformational mode (see Section 2.1; Wiek et al., 2012). The dual mission of sustainability science (Hall et al., 2017; McGreavy & Kates, 2012) shapes this discipline, always seeking solutions to real world problems and being teleologically directed towards sustainable development (Spangenberg, 2011). Most importantly is the connection of science (knowledge) and practice (societal action and informed decision making) between which sustainability science creates a dynamic bridge (Clark, 2007; Kates, 2015; Sala et al., 2015; Turner II et al., 2003). To manage both the descriptive-analytical and the transformational mode, sustainability science needs to be transdisciplinary (Jahn, Bergmann & Keil, 2012; Lang et al., 2012; Schaltegger et al., 2013; Spangenberg, 2011). Transdisciplinary research is not only characterised by science-practice collaborations that focus on societally relevant problems and seek for real-world solutions, but also by methodological pluralism and collaborations of various disciplines (Lang et al., 2012; Schaltegger et al., 2013; Spangenberg, 2011).<sup>17</sup> In sustainability science, pluralism is required to handle the complexity arising from the multidimensionality of the framework. A conceptual agenda for transdisciplinary research can be found in Jahn et al. (2012); and Lang et al. (2012) and is reproduced in Figure 2.13. Societal and scientific practice work hand in glove. During the first phase (Phase A), a societal problem is identified and triggers the scientific research question. Herefrom, the joint problem is framed, and collaborative teams from academia and practice are built, such that mutual learning among researchers and practitioners is enabled. In Phase B, solution-oriented and transferable knowledge is generated and disclosed. Subsequently, this knowledge is reintegrated and applied, leading to useful and relevant results for social and scientific practice in Phase C. This in turn loops

<sup>17</sup>A detailed differentiation of disciplinary, multidisciplinary, interdisciplinary, and transdisciplinary research can be found in Schaltegger et al. (2013).

**Figure 2.13** Conceptual agenda of a transdisciplinary research processes (based on Jahn et al., 2012; Lang et al., 2012; with friendly permissions of c 2012 Elsevier B.V. All rights reserved; c Springer 2012)

back into Phase B and Phase A.

Taking into account the research reviewed for this work, the discipline sustainability science has accomplished Phase A to the point of being on hold for further feedback loops. Societal and scientific problems are framed, which, for example, resulted in the SDGs (see Section 2.3.3). The development of a sustainable development indicator set demands scientific knowledge production as well as political norm creation (Rametsteiner, P¨ulzl, Alkan-Olsson & Frederiksen, 2011). The SDGs successfully draw this line from science to practice first by the process itself (see Section 2.3.3) and second by providing results of the goals, targets, and indicators for political decision making as well as scientific analysis. Actor specific and scientific disclosure (Phase B) has been performed. Examples include corporations that disclose sustainability reports in accordance with the standard of the GRI (see Section 3.3.1; GRI, 2016) and the growing number of academic publications (Kates, 2015). Phase C has been entered but it is not finalised yet, such that sustainable development remains a vision of future (White, 2013). Useful and relevant results for society and science have been generated but are not completed. On the scientific side, not all planetary boundaries have been quantified, the concept of social boundaries demands further refinement, and the corresponding economic system has to be designed (see Section 2.2). On the societal side, for instance, practicability and effectiveness of the SDGs have to be tested and concluded on. Future research will be discussed in Section 6.3. In spite of having entered Phase C, there are bottlenecks in

the science-practice linkage (Castellani, Piazzalunga & Sala, 2013; Sala et al., 2015), also called the knowledge-to-action gap (Sala et al., 2013) or the sustainability gap (Agyeman, 2005; Christie & Warburton, 2001; Hall et al., 2017). This work aims to contribute to closing this fourth research gap by easily applicable measurement methods, which will be discussed and selected in Section 3.2 et seq.

#### **2.4 Summary**

In this chapter, a six-dimensional framework of sustainable development has been developed, and three central conceptual principles of the management of sustainable development have been identified. The finalised framework includes both the descriptiveanalytical and the transformational mode of sustainable development. The dimensions (1) to (3) in Figure 2.11 primarily refer to the descriptive-analytical mode, whereas dimensions (4) to (6) primarily bear upon the transformational mode. The temporal horizon (1) implies that present and forward-looking time series analysis instead of single points in time should be incorporated. The contentual domain (2) consists of several concepts. Environmental protection rests on the concept of limits, represented by the planetary boundaries. Social development is theorised by the concept of needs, captured by the social boundaries, within which the principle of justice should be applied. Combining these concepts, the safe and just operating space for humanity results, for which the green economy should be calibrated. This ideal system should be applied around the whole globe and at every regional scope (3). Sustainable development is a vision of future, which is aimed to become the present as soon as possible. Necessary to this end is change and transition, managed and guided by change agents (4) of every aggregational size (5), who take decisions at normative, strategic, and operational tiers (6). By including the multilevel perspective on the aggregational size of change agents and the St. Gallen management model for the decisional tiers, the perspective and the operational-to-normative gaps are closed, respectively. The conceptual management principles of societal instrumental finality (i), paradox teleological integration (ii), and practicability (iii) ought to be obeyed with regard to every dimension of the framework. Sustainable development requires a transdisciplinary working agenda, whose main characteristic is the connection from science to practice. The SDGs are a successful transdisciplinary result. Nonetheless, a knowledge gap of the individual sustainable development elements and their dynamic interactions as well as a sustainability gap concerning the application of crafted scientific knowledge to political, entrepreneurial, and societal practice is present.

The next chapter, Chapter 3, deals with the measurement and assessment of contributions to sustainable development. Any pursued method should comply with the conceptual framework of sustainable development and is critically determined by the ability to address the knowledge and the sustainability gaps.

#### 2.4. Summary 39

Open Access This chapter is licensed under the terms of the Creative Commons At tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Chapter 3**

# **Measuring and assessing contributions to sustainable development**

Measurement and assessment of sustainable development must be executed to reduce the risk of failure in the transition to sustainability. The old axiom "what gets measured gets managed" (e.g. Parris & Kates, 2003) or its reverse "what is not measured often gets ignored" (e.g. Giljum, Burger, Hinterberger, Lutter & Bruckner, 2011) prevails. Measurement and assessment address both the descriptive-analytical and the transformational mode of sustainable development (see Section 2.1; Wiek et al., 2012): They generate and structure information to serve decision making (Waas et al., 2014).

The measurement of contributions to sustainable development can involve the measurement of practices or performances (e.g. Gjølberg, 2009). Practice measurement quantifies activities, but it does not include a practice's result and is therefore unrelated to a practice's success (Gjølberg, 2009; Wood, 1991) or effectiveness. In contrast, performance measurement quantifies results that allow for inferences back to performed practices despite the absence of direct information about these practices (Searcy, 2012; Tangen, 2005). Hence, performance measurement supports managing, controlling, planning, implementing, and evaluating practices and activities (Ramos & Moreno Pires, 2013; Searcy, 2012; Tangen, 2005) that are directed towards sustainable development (Bond, Pope & Morrison-Saunders, 2015; Hacking & Guthrie, 2008). Because of this superior property, performance measurement and not practice measurement is adopted for the remainder of this work.

Besides the overarching objective to support both modes of sustainable development, several reasons for measurement and assessment of sustainable development are present. Measurement helps to better understand and interpret the current situation as well as the desired end state (Searcy, 2012; Waas et al., 2014) by enabling evaluation of progress towards goals (Kates, 2015; Searcy, 2012; Spangenberg, 2015; Vermeulen, 2018),

adherence of standards (Ramos, Caeiro & Joanaz de Melo, 2004), or derivations from baselines and principles (Hacking & Guthrie, 2006, 2008). Quantification further facilitates comparison of performances (Esty, 2018; Waas et al., 2014), policy appraisal, and identification of superior regulatory approaches (Esty, 2018). Eventually, measurement serves as a basis for efficient decision making (Baumgartner, 2014; de Villiers & Hsiao, 2018; Parris & Kates, 2003; Ramos et al., 2004; Waas et al., 2014; Wu & Wu, 2012) and is thus required for goal achievement (Alm´assy & Pint´er, 2018). Moreover, measurement and assessment results can be reported to stakeholders for reduction of information asymmetries (R. Hahn & K¨uhnen, 2013; Maroun, 2018). Asymmetric information are present when "different people know different things" (Spence, 1973; Stiglitz, 2002), and in signalling theory, asymmetric information are sought to be reduced by "high quality firms" to increase their payoff (Connelly, Certo, Ireland & Reutzel, 2011). Above average sustainable development performances may be signalled to stakeholders for image enhancement; building relationships, legitimacy, and accountability with stakeholders (see Section 2.3.2; Landrum & Ohsowski, 2018; Maroun, 2018). However, only effective green practices and not greenwashing, which is the overstatement of environmental commitments, is positively correlated with the firm value (Testa, Miroshnychenko, Barontini & Frey, 2018). Underperformance might lead to shame, which is the origin of the power of monitoring (Kelley & Simmons, 2015). To be in line with societal instrumental finality (see Section 2.3.2; e.g. T. Hahn & Figge, 2011), an increased payoff should not be the ultimate goal but a byproduct.

Criticism on measurement and assessment of contributions to sustainable development is scarce. One objection could be that sustainable development might be immeasurable (Bell & Morse, 2008; B¨ohringer & Jochem, 2007). The measurement of sustainable development depends on the body performing it, and hence, subjectivity is inevitable. Sustainable development becomes defined when measured by quantifiable variables, instead of being defined before measuring it (Bell & Morse, 2008). This finding comes into effect in the methodological choices (see Section 4.3.7.1). In contrast, the temperature is an example for a measurable, pre-defined variable. In spite of this possible objection, sustainable development should be measured as benefits dominate.

The chapter is structured as follows. In the next section, Section 3.1, principles of sustainable development measurement and assessment methods are summarised and harmonised. Hereafter, an overview on quantitative assessment methods is given in Section 3.2. The various assessment methods are evaluated against the conceptual framework (see Figure 2.11) and assessment principles (see Section 3.1) to derive the most suitable method for addressing the first four identified research gaps: First, the assessment method must be able to address the perspective gap (see Section 2.3.1), second tackle the operational-to-normative gap (see Section 2.3.2; e.g. Baumgartner & Rauter, 2017), third, give indication on the interlinkages of the individual sustainable development elements (knowledge gap) (see Section 2.3.3; e.g. Weitz et al., 2018), and

fourth, be easily applicable in practice to close the sustainability gap (see Section 2.3.4; e.g. Hall et al., 2017). Section 3.3 gives an overview on micro, meso, and macro sustainable development indicators (see Section 3.3.1) and indices (see Section 3.3.2 and Section 3.3.3). A summary is provided in Section 3.4.

### **3.1 Principles of sustainable development measurement and assessment methods**

In 1997, a group of practitioners from the International Institute for Sustainable Development (IISD) developed principles for the measurement of sustainable development (IISD, 1997). These principles became known as the Bellagio Sustainability Assessment and Measurement Principles (Bellagio STAMP) and were updated by Pint´er, Hardi, Martinuzzi and Hall (2012, 2018). The Bellagio STAMP consist of eight principles: guiding vision; essential considerations of the underlying subsystems' environment, society, and economy, including implications of synergies and trade-offs for decision making; adequate temporal and geographical scope; framework and standardised indicators that enable comparisons;<sup>18</sup> transparency of data, methods, and results; effective communication to attract a broad audience; broad stakeholder participation for legitimacy; and last, continuity and capacity of and for measurement.

Hacking and Guthrie (2008) identify the following principles in sustainable development assessment: comprehensiveness of theme coverage; integratedness of themes and techniques; and strategicness of goals, benchmarks, scales, and scope, including alternatives, cumulative impacts, and uncertainties. Sala et al. (2015) add to Hacking and Guthrie's (2008) principles boundary orientedness, stakeholder involvement, scalability, transparency, as well as objectivity and robustness in measurement.

According to Esty (2018), benchmarking must be possible across all scales and issues (i.e. along the temporal horizon, contentual domain, geographical region, and aggregational size) for understanding and judging relative performances. Benchmarking and multilevel comparability is essential to enable quantification of micro-level and meso-level contributions to the society level concept of sustainable development (see Section 2.3.2; e.g. T. Hahn et al., 2015). Establishing a micro-to-macro connection is essential because effects on the planet (macro level) are the cumulative results of individuals (micro level) (Dahl, 2012), such that sustainable development can only be achieved if micro and meso objects contribute (Griggs et al., 2014; Sachs, 2012). Furthermore, benchmarking is important because rankings are rendered possible, preventing greenwashing, forcing objects of investigation to question their own performance, facilitating the detection of underperformance and thereby creating social pressure

<sup>18</sup>Indicators play a crucial role in the assessment of sustainable development and therefore entered the Bellagio STAMP. Section 3.3 will reveal the reason for their centrality.


towards stakeholders (see above; Kelley & Simmons, 2015). Therefore, benchmarking and rankings are interpreted as drivers of behaviour and change (Becker, Saisana, Paruolo & Vandecasteele, 2017; Kelley & Simmons, 2015) by triggering motivation (Dahl, 2018), which eventually leads to progress (Esty, 2018). Interconnection of goals is necessary because individual sustainable development elements depend on each other and contribute to the overarching objective of sustainability in an unequal manner (Costanza, Fioramonti & Kubiszewski, 2016; Griggs et al., 2014; T. Hahn & Figge, 2011). Synergies and trade-offs are present. Synergies are interactions that favour each other, whereas trade-offs are interactions that hinder each other (Pradhan et al., 2017). Figge and Hahn (2004) postulate the inclusion of both relative and absolute measurement to project efficiency as well as effectiveness, necessary to control for rebound effects (Berkhout, Muskens & Velthuijsen, 2000; Dyllick & Hockerts, 2002; Harangozo et al., 2018; Schneider et al., 2011). T. Hahn and Figge (2011) press for practicability of measurement tools.<sup>19</sup> For Cash et al. (2003); and Parris and Kates (2003), assessment principles are salience, credibility, and legitimacy. Salience refers to relevance of the measurement to decision makers, credibility regards the scientific and technical adequacy of measurement, and legitimacy is concerned with the stakeholders' views. Closely related are Janouˇskov´a et al.'s (2018) principles: relevance, validity, and reliability. Relevance is "the importance of something" or "the relationship of something to the matter at hand" (Janouˇskov´a et al., 2018). It functions as a selective criterion, and only relevant, important, and useful information gets observed. Hence, relevance and its maximisation is key to human cognition (Janouˇskov´a et al., 2018; Sperber & Wilson, 1999), and it has become a major area in information science (Cosijn & Ingwersen, 2000; Janouˇskov´a et al., 2018). With regard to sustainable development, relevance represents the importance of the contentual domains and their individual elements (Janouˇskov´a et al., 2018). In Chapter 4 et seq., it will be revealed that this work is also shaped by information-theoretic relevance. Validity refers to the "degree to which the measurement tool measures what it claims to measure" (Janouˇskov´a et al., 2018), and reliability regards the consistency of measurement. Methodological soundness is crucial for policy or management conclusions to be accurate and non-misleading (B¨ohringer & Jochem, 2007; Nardo et al., 2008). Holden et al. (2017); and Spangenberg (2015) list the same principles with slightly different wording. An overview on the presented assessment principles is given in Table 3.1. The last column of Table 3.1 summarises and harmonises the various principles into one structure, which is then utilised to evaluate a quantification method's aptitude to measure and assess contributions to sustainable development by micro, meso, and macro objects of investigation. An evaluation of quantitative assessment methods follows in the next section, Section 3.2.

<sup>19</sup>Practicabiliy entered the sustainable development framework as a conceptual principle (see Section 2.3.2). Due to its inherent conceptual and practical relevance, it is also incorporated in the assessment priniples.

### **3.2 Overview of quantitative sustainable development assessment methods**

Quantitative sustainable development measurement and assessment methods can be categorised by their temporal focus (e.g. Ness, Urbel-Piirsalu, Anderberg & Olsson, 2007), methodological approach (e.g. Sala et al., 2015), or measurement unit (e.g. Gasparatos & Scolobig, 2012). Because this work aims to implement the multilevel perspective (see Section 2.3.1; Rotmans et al., 2001), a categorisation by the aggregational size of an object of investigation is expedient. Figure 3.1 gives an overview on micro, meso, macro, and multilevel assessment methods.

As only multilevel methods are relevant, single level assessment methods are not further explained but only listed.

At the micro level, products or projects might be assessed. Major techniques for product assessment include life cycle costing, life cycle assessment, and contingent valuation. Details on these methods can be found in, e.g. Curran (1996); Finnveden et al. (2009); Finnveden and Moberg (2005); Finnveden and Ostlund (1997); McWilliams and ¨ Siegel (2011); Ness et al. (2007); and Patterson et al. (2017). Projects can be appraised by cost benefit analysis or various impact assessment methods, such as environmental impact assessment or integrated sustainability assessment (e.g. Finnveden & Moberg, 2005; Ness et al., 2007; Petts, 1999a, 1999b; Pope et al., 2017; Sala et al., 2015; Weaver & Rotmans, 2006). Assessment tools for corporations include, for example, the sustainable value added and measures for relative sustainable performance (Cubas-D´ıaz & Mart´ınez Sedano, 2018; Figge & Hahn, 2004; T. Hahn & Figge, 2011). Policy, plans, and programmes can be evaluated by the strategic environmental impact assessment (e.g. Finnveden & Moberg, 2005; Ness et al., 2007; Partid´ario, 1999; Therivel & Partid´ario, 1996). Probably the most prominent example of macro-level measurement is the ecological footprint by Wackernagel and Rees (1996).<sup>20</sup> Other macro-level environmental accounting or green accounting methods include the adjusted national accounts, in which key figures such as the GDP or the Net Domestic Product (NDP) and the Gross National Income (GNI) or the Net National Income (NNI) are greened (e.g. Bartelmus, 2018; Finnveden & Moberg, 2005; Hanley, 2000; Hueting & de Boer, 2018; Singh et al., 2012). Input-output analysis as well as system assessment and modelling, including vulnerability analysis, multiagent simulation models, Bayesian network models, and system dynamic models, are further macro tools (e.g. Boulanger & Br´echet, 2005; Costanza, Daly et al., 2016; Finnveden & Moberg, 2005; Ness et al., 2007; Patterson et al., 2017; Todorov & Marinova, 2011; Turner II et al., 2003).

Multilevel methods comprise, for instance, regression analysis, full cost accounting,

<sup>20</sup>The ecological footprint is often listed as an index (e.g. Saisana & Philippas, 2012; Singh, Murty, Gupta & Dikshit, 2012). However, Wackernagel et al. (2018) clarify it to be an environmental accounting system.


material flow accounting, indicator sets, footprints, as well as risk and uncertainty analysis. Regression analysis studies the relationship of variables. Typically, there is one dependent variable and one or more independent variables. Examples in the field of sustainable development involve A¸sıcı (2013); dos Santos Gaspar, Cardoso Marques and Fuinhas (2017); Gao and Bansal (2013); Godos-D´ıez et al. (2011); M. V. L´opez, Garcia and Rodriguez (2007); Menegaki and Ozturk (2013); Menegaki and Tiwari (2017); and Testa et al. (2018). Because regression analysis requires a dependent variable and focuses on the relationship of few variables, it is not suitable nor able to capture the multiple facets of sustainable development. However, investigating relationships of variables (i.e. their synergies and trade-offs) remains important in closing the knowledge gap (see Section 2.3.3; e.g. Weitz et al., 2018). Full cost accounting is the assessment of costs arising from all three contentual domains. This method generally complies with the conceptual framework (see Figure 2.11) but involves the conversion of non-monetary units, such as physical units stemming from the environmental domain, to monetary units (e.g. G. D. Atkinson, 2000; Ness et al., 2007). Reasons for avoidance of this procedure will be discussed in Section 4.3.4. Material flow accounting deals with the flow of materials in production processes. Energy analysis, emergy analysis, and exergy analysis are examples of this method (e.g. Finnveden & Moberg, 2005; Finnveden & Ostlund, 1997; Ness et al., 2007; Odum, 1996; Patterson et al., 2017; Wu & Wu, 2012). ¨ Due to its focus on materials, other elements of sustainable development are disregarded, and thus, a comprehensive picture of sustainable development cannot be drawn.

Indicator sets have played an important role in the debate on sustainable development assessment.<sup>21</sup> Practitioners as well as scientific scholars demanded the deployment of sustainable development indicators for a solid base for decision making since the 1990s (e.g. Antonini & Larrinaga, 2017; Baumgartner, 2014; B¨ohringer & Jochem, 2007; Cabeza Gut´es, 1996; Costanza, Fioramonti & Kubiszewski, 2016; Eurostat, 2018; Kelley & Simmons, 2015; Nardo et al., 2008; Parris & Kates, 2003; Ramos & Moreno Pires, 2013; Singh et al., 2012; Spangenberg, 2015; UNCED, 1992; UNEP, 2011; Vermeulen, 2018; Wu & Wu, 2012). The reasons for this urge are manifold. Indicator sets generally have a high potential to comply with the sustainable development framework (see Figure 2.11) and the assessment principles (see Table 3.1). Indicators can be easily computed for a time series, the multiple facets of the contentual domains can be represented by individual indicators, an indicator set can be repetitively computed for diverse geographical regions, and indicators are – when designed accordingly (see Section 4.3.2 and Section 4.3.4) – capable of applying the multilevel perspective (see Section 2.3.1; Rotmans et al., 2001), ensuring object comparability. Moreover, each change agent group can contribute to the establishment and the use of indicators. Businesses may be objects of investigation and change agents simultaneously. On behalf of society, policy and science may decide upon the design of the indicator set or compute the set to

<sup>21</sup>Technical terms and definitions of indicators will be introduced in Section 3.3.

draw conclusions for management and policy making. Indicators further serve the last dimension of the sustainable development space: With indicators, the (often-forgotten) strategic tier can be addressed in addition to the operational tier (Baumgartner, 2014) because indicators can measure distances to strategic goals. Thereby, the operationalto-normative gap (see Section 2.3.2; e.g. Baumgartner & Rauter, 2017) is tackled. The normative tier does not need to be managed by the measurement because sustainable development indicators are inherently normative (Bakkes et al., 1994; Waas et al., 2014). The normative tier is a prerequisite dealt with in the conceptual phase (see Section 2.3.2) and later on reflected by the methodology (see Chapter 4). Indicator sets can follow societal instrumental finality by linking indicator targets to societal targets. For instance, thresholds of the planetary boundaries can be broken down into thresholds for micro, meso, or macro objects of investigation (e.g. O'Neill et al., 2018; Whiteman et al., 2013). However, further research is needed in this field (see Section 4.3.6.2) and will be discussed in Section 6.3. In fact, Section 3.3 will reveal that the possible linkage to reference values (i.e. targets and boundaries) is the defining feature of indicators. Paradox teleological integration and the acknowledgement of the coexistence of oppositional elements can be managed by individually pursuing targets of the indicators. Exploring sustainable development elements' synergies and trade-offs can be reached by including a composite measure in an indicator set (Costanza, Fioramonti & Kubiszewski, 2016; T. Hahn & Figge, 2011). Portraying both efficiency and effectiveness is feasible by incorporating relative as well as absolute values. With relative measures, relative decoupling of economic growth and environmental degradation (see Section 2.2.3) can be managed, a major challenge for decision makers (Holden et al., 2014). Enclosing absolute, nonstandardised measures implies to sacrifice comparability and may therefore be only realised to some extent. Section 4.3.4 will further discuss this conflict. Given indicators' simplicity, they are practicable in computation, viable in stakeholder participation and consensus building (Parris & Kates, 2003), and effective in communication with the public at large (Spangenberg, 2015). A closure of the sustainability gap (see Section 2.3.4; e.g. Hall et al., 2017) can thus be yielded. Transparency and methodological soundness can be in place for any measurement method.

The main advantage of including a composite measure in an indicator set is the exploration of synergies and trade-offs, thereby addressing the knowledge gap (see Section 2.3.3; e.g. Weitz et al., 2018). Furthermore, comprising various indicators in an index implies presenting complexity in simple ways (Bell & Morse, 2018): A composite measure is a compressed description of a multidimensional state (Ebert & Welsch, 2004), providing a simple summary picture (Becker et al., 2017). Thereby, the important focus in measurement is recaptured (Griggs et al., 2014), such that a better understanding of the data is obtained (Jesinghaus, 2018), combating the disadvantage of a rich indicator set to potentially cause more confusion than understanding (Wu & Wu, 2012). Alm´assy and Pint´er (2018); Costanza, Fioramonti and Kubiszewski (2016); Hanley et al. (1999); Nardo et al. (2008); and Ramos and Moreno Pires (2013) even argue that sustainable development necessarily requires an index because it is a multifaceted concept that cannot be captured by standalone indicators, and GDP as a measure of wellbeing needs to be replaced. Moreover, an index further facilitates benchmarking (Alm´assy & Pint´er, 2018; Ebert & Welsch, 2004), decision making (Bolis, Morioka & Sznelwar, 2017), and communication with policy, management, and the public (Becker et al., 2017; Moldan & Dahl, 2007; Ramos & Moreno Pires, 2013; Schmidt-Traub et al., 2017a).

Despite the manifold benefits, indicators and indices are critically discussed in the literature. First, (composite) indicators may not always be objective, precise, or certain. Subjectivity is inevitable (see Chapter 3; Bell & Morse, 2008) because it originates in the choices taken over the indicator computation method (Bondarchik, Jablo´nska-Sabuka, Linnanen & Kauranne, 2016; Singh et al., 2012; Waas et al., 2014; Wu & Wu, 2012). Precision cannot be proven because sustainable development only becomes defined when measured (see Chapter 3; Bell & Morse, 2008). Uncertainty cannot be eliminated but only accounted for (see below). Second, indices are criticised for their defining characteristic: Aggregation implies weak sustainability, such that underperformance in one aspect can be compensated by overperformance in another aspect (Holden et al., 2017). This mechanism grants decision makers with mediating power, and they might be tempted to set low weights on underperforming elements and high weights on overperforming elements (Jesinghaus, 2018). Objections to this criticism are that on the one hand, non-compensatory aggregation functions that do not allow for compensation may be applied (see Section 4.3.8; Pollesch & Dale, 2015), and on the other hand, weak sustainability is permitted within the safe and just operating space in any case (see Section 2.2.4). Moreover, full freedom in weight definition should not be granted (Rogge, 2012), but weights should be set universally to minimise arbitrariness and subjectivity as well as to ensure comparability. Universal validity of weights (as well as, e.g. outlier handling) will be further discussed in Section 4.3.5 and Section 4.3.7. Third, given the complexity reduction, indices may invite narrow-minded pathways and simplistic management and policy conclusions (Nardo et al., 2008; Spangenberg, 2015). To counter this argument, conclusions should always be double checked with the subjacent layers. Finally, the computation of a meaningful, methodological sound index is difficult (Ebert & Welsch, 2004), and therefore, the computation of a sustainable development index might not be practicable for all change agent groups. Support might be required. A summary of the evaluation of indicator sets against the assessment principles is visualised in Figure 3.2a. Towards the interior of the radar chart, the assessment method is not capable of fulfilling the principle, and at the exterior, it is qualified to accomplish the principle.

A footprint is the quantification of direct and indirect effects of human activity on, for example, global warming (carbon footprint) or water reserves (water footprint) (e.g. Cucek, Klemes & Kravanja, 2012; Ewing et al., 2012; Galli, Weinzettel, Cranston

**Figure 3.2** Capability evaluation of assessment principle compliance by indicator sets and footprints (based on Sala et al., 2015; with friendly permission of c 2015 The Authors)

& Ercin, 2013; Galli et al., 2012; Patterson et al., 2017). Given the possibility of computing a footprint for many variables and aggregating them into one composite measure, similar to indicator sets, footprints have a high potential of being in line with the conceptual framework (see Figure 2.11) and assessment principles (see Table 3.1). In contrast to indicators, footprints are informationally richer because they additionally include indirect effects. GRI (2016) sets the corporate standard to include upstream and downstream effects of direct suppliers and direct consumers (see Section 3.3.1). Though, to quantify total indirect effects of the entire value chain of upstream supply and downstream consumption, process methods or input-output analysis have to be applied and performed (Patterson et al., 2017). Similar to the computation of a sustainable development index, the computation of footprints might not be practicable for every change agent group. However, footprints do not produce easily understandable results as indices do, but outputs are rather complex. Stakeholders can neither be involved for acquiring legitimacy nor are footprints effective in communication. The analysis of footprints' compliance with the sustainable development assessment principles is shown in Figure 3.2b. Last, risk and uncertainty analysis are multilevel analyses, which can and should be performed after finalising any assessment in order to evaluate and minimise potential risks (Ness et al., 2007).

In conclusion, indicator sets that include a composite measure are the most successful assessment method in comprehensively quantifying sustainable development and tackling the first four identified research gaps: Comparability of micro, meso, and macro objects is ensured (perspective gap; see Section 2.3.1), each decisional tier can be addressed (operational-to-normative gap; see Section 2.3.2; e.g. Baumgartner & Rauter, 2017), synergies and trade-offs can be explored (knowledge gap; see Section 2.3.3; e.g. Weitz et al., 2018), and indicators are easily applicable (sustainability gap; see Section 2.3.4; e.g. Hall et al., 2017). In this respect, this work concentrates on sustainable development indicators and indices. The next section, Section 3.3, reviews previous indicator

frameworks and indices. As a concluding remark, it is emphasised that the other presented methods are also valuable in the analysis of and transformation towards sustainable development. For instance, life cycle assessment is a crucial approach at micro level, indirectly supporting the macro SDG 12 on responsible consumption and production. Standalone micro, meso, and macro assessment approaches should complement multilevel methods.

#### **3.3 Sustainable development indicators**

An indicator is an operationalisation of a system characteristic (Gallop´ın, 1997; Waas et al., 2014; Wu & Wu, 2012), and an indicator set is a group of indicators used for a particular purpose (Wu & Wu, 2012). An indicator can be a composite indicator, also called index, which is a function of its underlying indicators (Saltelli et al., 2008; Waas et al., 2014). As already pointed out in Section 3.2, comparability to reference values is the defining feature of indicators: A variable becomes an indicator when it is linked to a reference value or a benchmark (Waas et al., 2014). These can be targets or thresholds, expressing a normal or a desired state. Consequently, an indicator can assess progress while a variable cannot. To determine useful reference values, system knowledge and understanding is necessary (Wu & Wu, 2012). Examples of such macro-level system knowledge are the planetary boundaries (see Section 2.2.1; Steffen et al., 2015) and industry benchmarks, enabling to judge and pin down a corporation's performance at meso level (Cubas-D´ıaz & Mart´ınez Sedano, 2018; Figge & Hahn, 2004).

The next sections review meso (composite) indicators (see Section 3.3.1 and Section 3.3.2) and macro indices (see Section 3.3.3) and examine their conformity with the assessment principles (see Table 3.1). Reference to synergies and trade-offs is not made because they are inherent in indices (see Section 4.3.7). Methodological soundness will be investigated in Section 4.2. Macro indicator frameworks are not included in this section as the most elaborated framework – the SDGs – has been covered in Section 2.3.3. This section neither contains a section on micro nor multilevel indices. Micro indicator frameworks could not be identified, and only one micro index – the Better Life Index (BLI) (OECD, 2017) – could be detected. It is listed along with macro subjective indices in Section 3.3.3. Multilevel indices could not be traced at all; disregarding the multilevel perspective (see Section 2.3.1; Rotmans et al., 2001) is a general shortcoming of sustainable development measurement and assessment methods and consequently the main theoretical, methodological, and empirical contribution of this work.

#### **3.3.1 Corporate indicator frameworks**

Indicator frameworks can serve management control purposes (Parris & Kates, 2003) and are therefore used by corporations to integrate sustainable development into strategy (e.g. Bui & de Villiers, 2018; Gond, Grubnic, Herzig & Moon, 2012; Wijethilake, 2017; Witjes et al., 2017). The most widely used standard for corporate reporting on sustainable development indicators is the GRI framework, used by 63% of reporting companies in 2017 (KPMG, 2017).<sup>22</sup> The GRI standard was established in the 1990s with the goal to provide a trusted and credible framework (Ogata, Inoue, Ueda & Yagi, 2018) that "can be used by an organisation of any size, type, sector, or geographic location" (GRI, 2016) to quantify corporate contributions to sustainable development. The framework is divided into six disclosures: an organisation's reporting principles, reporting practices, management approach, and indicators of the three contentual domains. Details on the currently valid standard can be found in GRI (2016).<sup>23</sup> Given the large variety of topic coverage, the GRI framework can be considered as comprehensively picturing sustainable development contributions. Within the world of business, comparability is enhanced by creating a common language (GRI, 2016). However, the framework is criticised for following the business case of sustainable development (Landrum & Ohsowski, 2018) instead of engaging in societal instrumental finality and paradox teleological integration. The author of this work does not agree on this criticism because first, the GRI standard is a reporting standard that does not provide integrated information on the importance of the individually reported indicators, such that dominance of one aspect over the other is not a subject matter. Second, reports are released to guide business in their alignment with the societal level SDGs (GRI & UNGC, 2018a, 2018b; GRI, UNGC & WBCSD, 2015, 2017), which follow societal instrumental finality and paradox teleological integration by definition. Antonini and Larrinaga (2017) criticise GRI reports for not including boundary values. To set against, the science community is required to derive meaningful corporate boundaries from the macro level; first research exists, but more work is necessary to integrate boundaries into corporate practice (see Section 3.2 and Section 6.3; e.g. Haffar & Searcy, 2018; Whiteman et al., 2013).

Further sustainable development reporting standards for corporations involve, for instance, the Prince's Accounting for Sustainability Project (A4S), Integrated Reporting <IR> by the International Integrated Reporting Council (IIRC), and the Sustainability Accounting Standards Boards (SASB) (A4S, 2018; IIRC, 2013; Ogata et al., 2018; SASB, 2018). These are not further considered because of their deviating focus (e.g. on finance and investment). An overview of corporate reporting tools on sustainable development can be found in, e.g. Siew (2015).

<sup>22</sup>Sample: 4,900 top 100 companies in terms of revenues in 49 countries.

<sup>23</sup>Minor updates will become effective in 2021 (GRI, 2019).

**Figure 3.3** Evaluation of assessment principle compliance by meso-level indices of sustainable development

#### **3.3.2 Meso-level indices**

According to the multilevel perspective by Rotmans et al. (2001; see Section 2.3.1), meso-level indices are metrics for networks, communities, or organisations such as corporations. Two expedient meso-level indices for the assessment of sustainable development contributions by corporations are identified and discussed in the following.

The family of the DJSI aims to provide investors with benchmarks of corporate performances for "managing their sustainability investment portfolios" (S&P Dow Jones Indices, 2018). Aspects of sustainable development are widely covered (RobecoSAM, 2018a). However, the indices' objective misses the conceptual framework of sustainable development by definition: Societal instrumental finality is clearly not the purpose but management of investment is (RobecoSAM, 2019). The non-transparent presentation of the DJSI hampers its evaluation against the assessment principles. The methodology report (S&P Dow Jones Indices, 2018) as well as further documents available on the RobecoSAM website (RobecoSAM, 2018c) neither deliver a clear picture. Examining the available information, it seems that the DJSI involve both efficiency and effectiveness measures. However, it seems that the DJSI are neither comparable,<sup>24</sup> nor target oriented or practicable, but corporations can apply and are invited for an assessment. Therefore, stakeholder involvement is reduced. Effective communication may also be harmed, given the great number of indices and low transparency. In conclusion, the DJSI are inappropriate instruments in assessing corporate contributions to sustainable development. However, they may be valuable for investors. Figure 3.3a summarises the DJSI's properties, evaluated against the assessment principles.

In contrast, the ICSD was explicitly developed to monitor corporate contributions to sustainable development (Krajnc & Glaviˇc, 2005). The data input of this index is based on the GRI framework, generally ensuring data quality and coverage of the

<sup>24</sup>This conclusion is drawn from the floating and industry-specific weights (see Section 4.2; RobecoSAM, 2018b; S&P Dow Jones Indices, 2018, 2019).

three contentual domains. However, the social domain is not sufficiently dealt with, for example, aspects concerning equality (SDG 5 on gender equality) are missing. Furthermore, profits enter the economic domain despite the fact that they are not key to sustainable development (see Section 2.2.3 and Section 2.3.2; e.g. Vermeulen, 2018). Comparability is not ensured because indicators are standardised to the unit of production, which is further discussed in Section 4.3.4. However, absolute as well as relative values are included, and targets are set. Given the ICSD's transparency and simple structure, this index is practicable (as far as possible, see Figure 3.2a), suitable for stakeholder involvement, and effective in communication. The appraisal of this index against the assessment principles is visualised in Figure 3.3b.

Several authors engage in the construction of corporate social responsibility indices (e.g. Amor-Esteban, Galindo-Villard´on & Garc´ıa-S´anchez, 2018; Gjølberg, 2009; Ruf, Muralidhar & Paul, 1998; Skouloudis, Isaac & Evaggelinos, 2016). Such indices generally fail in complying with the conceptual framework because corporate social responsibility seeks to eliminate negative effects of businesses instead of actively contributing to sustainable development (see Section 2.3.2; e.g. Bansal & Song, 2017). Further indices can be found in, e.g. Singh et al. (2012). However, these indices are unrewarding for the comparable measurement of contributions to sustainable development by micro, meso, and macro objects and are thus not further investigated.

#### **3.3.3 Macro-level indices**

GDP plays a central role in macro-level measurement of sustainable development because GDP is the most widely used measure of macro-economic performances (see Section 2.2.3; Giannetti et al., 2015). Macro-level measures of sustainable development seek to replace GDP by going beyond economic performance and are thus called GDP alternatives. The SDGs might be a potential vehicle for GDP alternatives, which can be classified into three types: adjusted economic measures, subjective measures of wellbeing, and weighted composite indicators of wellbeing (Costanza et al., 2014). Adjusted economic measures are macro-economic measures in monetary units that are supplemented with environmental and social aspects. Examples include the Eco Domestic Product (EDP) (e.g. Hanley, 2000), Genuine Progress Indicator (GP) (e.g. Lawn, 2003), Genuine Savings Indicator (GS) (e.g. Pearce & Atkinson, 1993; Pearce, Hamilton & Atkinson, 2001), Index of Sustainable Economic Welfare (ISEW) (e.g. Be¸ca & Santos, 2010; Costanza & Daly, 1992; H. E. Daly & Cobb, 1989), Inclusive Wealth Index (IW) (e.g. Dasgupta, 2010), and the Sustainable Net Benefit Index (SNBI) (e.g. B¨ohringer & Jochem, 2007; Mayer, 2008; Saisana & Philippas, 2012; Singh et al., 2012; van den Bergh, 2009). As this type of measure can only be applied at the macro level and quantifies sustainable economic welfare instead of sustainable development as a whole (Lawn, 2003), it cannot serve the research question of the present work. Subjective

welfare measures are survey-based metrics and aspire to quantify subjective wellbeing. The BLI (e.g. OECD, 2017),<sup>25</sup> Compass Index of Sustainability (CIS) (e.g. Atkisson & Hatcher, 2001), Gross National Happiness (GNH) (e.g. CBS & GNH Research, 2016), and the Happy Planet Index (HPI) (e.g. Bondarchik et al., 2016; NEF, 2012) are examples of (at least partially) subjective welfare measures. Subjective wellbeing highly varies between societies and cultures. A universal and comparable measure is difficult to obtain (Costanza et al., 2014), which is not in line with the conceptual framework of being universally applicable (see Section 2.1; WSSD, 2002) and the assessment principle objectivity (see Table 3.1; Sala et al., 2015). Thus, subjective measures of welfare are not further considered. Last, weighted composite indicators of wellbeing give a comprehensive picture of sustainable societal wellbeing (Costanza et al., 2014), capturing the notion of sustainable development as a whole. A prerequisite for comprehensiveness is the inclusion of the three contentual domains. Indices that omit one domain are disregarded. Examples include the Composite Environmental Performance Index (CEPI) (e.g. Garc´ıa-S´anchez, das Neves Almeida & de Barros Camara, 2015), Environmental Performance Index (EPI) (e.g. Esty & Emerson, 2018), Environmental Sustainability Index (ESI), Environmental Vulnerability Index (EVI) (e.g. Dahl, 2018), and the Living Planet Index (LPI) (e.g. WWF, 1998). Moreover, the suggestion that both subjective and objective indicators should be integrated (Costanza et al., 2007; Costanza et al., 2014) is not followed because it would violate the assessment principle objectivity (see Table 3.1; Sala et al., 2015). In the following, seven macro-level indices that include the three contentual domains of sustainable development are examined: the Fondazione Eni Enrico Mattei Sustainability Index (FEEM SI), Human Sustainable Development Index (HSDI), Mega Index of Sustainable Development (MISD), SDGI, Sustainable Development Index (SDI), SSI, and the Wellbeing Index (WI). An overview on the mentioned GDP alternatives, sorted by their capability of capturing sustainable development, is displayed in Figure 3.4.

The FEEM SI is an index that projects future evolution of macro-economic contributions to sustainable development by being based on a general equilibrium model. It is able to generate scenarios under different policy assumptions (Carraro et al., 2013; Pinar, Cruciani, Giove & Sostero, 2014) and is therefore a macro-economic tool that supports target setting and policy making for the transition to sustainability. It can neither be transferred to micro nor meso objects but disregards the multilevel perspective (see Figure 3.5a). Because of the modelling complexity, it is neither practicable, effective in communication, nor can stakeholders be involved. On the positive side, the index includes efficiency as well as effectiveness and is transparent.

The HSDI is a composite measure that investigates the aggregate of four indicators: life expectancy at birth, years of schooling, purchasing power adjusted GDP p.c., and

<sup>25</sup>The BLI is a micro index quantifying "whether life is getting better for people" (OECD, 2017). It is listed in this section as it is the only identified micro index (see Section 3.3).

**Figure 3.4** Overview of Gross Domestic Product (GDP) alternatives

Greenhouse Gas (GHG) emissions p.c. (Bravo, 2014, 2018; Singh et al., 2012; Togtokh, 2011; Togtokh & Gaffney, 2010; UNDP, 1990).<sup>26</sup> Given its few variables, this index is neither able to comprehensively map the environmental domain (Bravo, 2014, 2018) nor sustainable development as a whole (see Figure 3.5b). Furthermore, the index cannot be computed in a meaningful way for businesses. However, it can be universally applied to different regions, it includes absolute values (e.g. life expectancy at birth) and relative values (e.g. GHG emissions p.c.), and targets and boundaries are set (e.g. 100% literacy rate) (UNDP, 1990). Given the HSDI's simplicity, it is practicable (as far as possible, see Figure 3.2a), stakeholders can be involved, and results can be communicated effectively. Its methodology and data are transparent.

The MISD is a function of 31 known indices (Shaker, 2015, 2018), which makes an evaluation with the assessment principles difficult. Transparency is only given partially, and the principles comparability, efficiency and effectiveness, as well as target and boundary orientedness remain unknown (see Figure 3.5c). A mega index is not practicable because a huge variety of methods are implemented. The complexity also harms stakeholder involvement and effective communication.

Apart from the Global Burden of Disease Index (GBDI), which is a health-related index, the SDGI is the only index that is clearly linked to the SDGs (Lim et al., 2016; Schmidt-Traub et al., 2017a, 2017b). Therefore, it is a highly relevant candidate in comparably quantifying contributions to sustainable development. By definition, it maps the sustainable development domains well and is universally applicable to any

<sup>26</sup>The HSDI is a successor of the Human Development Index (HDI), which did not include GHG emissions p.c. (UNDP, 1990).

**(a)** FEEM Sustainability Index (FEEM SI) (e.g. Pinar et al., 2014)

**(c)** Mega Index of Sustainable Development (MISD) (e.g. Shaker, 2018)

**(e)** Sustainable Development Index (SDI) (Bolc´arov´a & Koloˇsta, 2015)

**(b)** Human Sustainable Development Index (HSDI) (e.g. Bravo, 2018)

**(d)** Sustainable Development Goal Index (SDGI) (e.g. Schmidt-Traub et al., 2017a)

**(f)** Sustainable Society Index (SSI) (e.g. van de Kerk et al., 2014)


geographical region (see Figure 3.5d). However, its macro-economic focus and resulting indicator selection prevents it to be applicable to micro and meso objects. Efficiency as well as effectiveness are measured, targets are included in terms of the SDG agenda or top five performers, and the transparent presentation enables stakeholder involvement as well as effective communication.

The SDI aims to quantify a country's contribution to macro-level sustainable development. It includes 12 indicators in areas such as socio-economic development, sustainable consumption and production, social inclusion, demographic changes, public health, climate change and energy, sustainable transport, natural resources, and global partnership (Bolc´arov´a & Koloˇsta, 2015). The SDI maps the contentual domains of sustainable development well and is universally applicable to different countries (see Figure 3.5e). However, given its indicator selection, a computation for micro and meso objects is not possible, such that comparability across aggregational sizes is not enabled. Absolute and relative indicators are present, but targets and boundaries are not included. Its simplicity further ensures practicability (as far as possible, see Figure 3.2a), stakeholder involvement, and effective communication. The assessment principle transparency is complied with.

The SSI also aspires to measure macro-level sustainable development of countries, and contains 21 indicators in the categories basic needs, health, personal and social development, natural resources, climate and energy, transition, and economy (Saisana & Philippas, 2012; van de Kerk & Manuel, 2008; van de Kerk et al., 2014). It generally complies with the conceptual framework by depicting the contentual domains well and by being universally applicable; scores for 151 countries are computed (see Figure 3.5f). However, it is only computable for macro objects, and the multilevel perspective is dismissed. Data and methods are transparently disclosed; targets are included in terms of a sustainability value, and efficiency as well as effectiveness are included. Practicability, stakeholder involvement, and effective communication are ensured.

Last, the WI is an index that comprises 87 indicators, thereof 36 indicators that summarise human wellbeing and 51 indicators that aggregate into ecosystem wellbeing. Topics covered are health and population, wealth, knowledge and culture, community, equity, land, water, air, species and genes, and resource use (Mayer, 2008; Prescott-Allen, 2001). The contentual domains of sustainable development are mapped well, but this index features the same shortcomings as the previously mentioned indices: It is not compliant with the multilevel perspective, disabling comparability across aggregational sizes (see Figure 3.5g). However, the WI is in line with the further assessment principles.

Summarising, the review yields following conclusions:


**Figure 3.6** Ranking of sustainable development indices by assessment principle compliance

they are generally comprehensive.


Figure 3.6 ranks the investigated sustainable development indices by their compliance with the assessment principles, sorted by their aggregational sizes.

#### **3.4 Summary**

Measurement and assessment of sustainable development is inevitable; only what is measured can be managed. With measurement and assessment, both modes of sustainable development – the descriptive-analytical and the transformational mode – are addressed. Knowledge is generated to serve informed decision making. In search of suitable assessment methods, the first four identified research gaps provide guidance: First, a sustainable development assessment method is required to comparably measure contributions to sustainable development by micro, meso, and macro objects (perspective gap); second, it must be capable of supporting decisions at operational, strategic, and

normative tier (operational-to-normative gap); third, it is demanded to investigate interlinkages of the individual sustainable development elements (knowledge gap); and fourth, it must be easily applicable to put the crafted knowledge into practice (sustainability gap). To be able to systematically determine a method's potential in approaching these gaps, sustainable development assessment principles are reviewed first. By summarising and harmonising this review, ten assessment principles are yielded: compliance with framework; comparability in all sustainable development dimensions; synergies and trade-offs of interconnected themes and goals; efficiency and effectiveness of impacts; target and boundary orientedness of individual sustainable development elements; practicability for decision makers; stakeholder involvement for legitimacy; effective communication to stakeholders; transparency of data, methods, and results; and methodological soundness. Second, multilevel assessment methods are evaluated based on these principles. Indicator sets that include a weighted composite indicator (i.e. a sustainable development index) result to be the most successful assessment method in tackling the first four identified research gaps. Two meso-level and seven macro-level indices are identified: the DJSI, ICSD, FEEM SI, HSDI, MISD, SDGI, SDI, SSI, and the WI. Examining these indices, substantial lacks in the assessment principles are ascertained. These involve, for instance, the non-comprehensive depiction of sustainable development elements, the violation of societal instrumental finality, and lacks in transparency. Moreover, multilevel indices could not be identified in the literature despite their compelling necessity, demonstrating the expansion of the perspective gap, which regards the conceptual framework, into methods and empirical findings. The multilevel perspective is neglected in the conceptual framework, leading to an absence of multilevel indices. This in turn results in a lack of multilevel comparable empirical findings. Given these deficiencies, this work develops a new index – the MLSDI – that comparably measures multilevel contributions to sustainable development, supports decisions at all tiers, comprehensively studies interconnections of sustainable development elements, and is applicable in practice. The MLSDI's methodology follows in the next chapter, Chapter 4.

#### 62 Chapter 3. Measuring and assessing contributions to sustainable development

Open Access This chapter is licensed under the terms of the Creative Commons At tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4**

### **Methodology**

Because sustainable development only becomes defined when measured (see Chapter 3; e.g. Bell & Morse, 2008), sustainable development index construction is an unsupervised modelling task without a supervising output variable (G. James, Witten, Hastie & Tibshirani, 2013). Consequently, sustainable development measurement is diverse in methods and methodologies (see Section 3.2, Section 3.3, and Section 4.2) and hallmarked by subjectivity and arbitrariness (e.g. B¨ohringer & Jochem, 2007), such that sustainable development indicators are rather confusing and non-consensual (Pope et al., 2017; Ramos & Moreno Pires, 2013). To counteract this finding and to achieve objectivity in assessment (see Table 3.1; Sala et al., 2015), the previous theoretical research is coupled with a profound methodological research. The conceptual framework derived in Chapter 2 has resulted in assessment principles in Section 3.1, and these now guide the methodological choices to be made from a pool of alternative techniques for each index calculation step. Thereby, methodological shortcomings of previous indices are overcome, which constitute the fifth and last research gap. Moreover, methodological understanding of the interactions of the individual sustainable development elements will be established by the end of this chapter: The knowledge gap (see Section 2.3.3; e.g. Weitz et al., 2018) is addressed by the index computation (see Section 3.2).

The first part of this chapter, Section 4.1, introduces the calculation steps of a sustainable development index and establishes methodological requirements based on the assessment principles in Section 3.1. By means of these requirements, the methodological approaches of the indices identified in Section 3.3.2 and Section 3.3.3 are evaluated in Section 4.2. The main part of this chapter, Section 4.3, addresses the MLSDI's methodology. First, data are collected (see Section 4.3.1), prepared (see Section 4.3.2 and Section 4.3.4), and cleaned (see Section 4.3.3 and Section 4.3.5); second, the major index computation steps are executed (see Section 4.3.6 to Section 4.3.8); and third, sensitivities are investigated (see Section 4.3.9). This chapter ends with a summary and interim conclusion in Section 4.4 that conflate the theoretical investigation of Chapter 2 and Chapter 3 as well as the methodological research of this chapter.

© The Author(s) 2021 C. Lemke, *Accounting and Statistical Analyses for Sustainable Development*, Sustainable Management, Wertschöpfung und Effizienz, https://doi.org/10.1007/978-3-658-33246-4\_4

### **4.1 Overview of sustainable development indices' calculation steps and methodological requirements**

Sustainable development indices are typically constructed in nine steps. These are visualised in Figure 4.1, and a primer can be found in Nardo et al. (2008). The first calculation step comprises the collection of sustainable development key figures (see Section 4.3.1). Key figures are the raw data to collect. For transparency (see Table 3.1; e.g. Pint´er et al., 2018), data acquisition should be open access. The preparation of sustainable development key figures is realised in the second calculation step (see Section 4.3.2) and is necessary because data from different aggregational objects (micro, meso, and macro) must be harmonised for multilevel object comparability (see Table 3.1; e.g. Hacking & Guthrie, 2008) and methodological soundness in terms of credibility, validity, and reliability (see Table 3.1; e.g. Cash et al., 2003; Janouˇskov´a et al., 2018). This step is typically not included in sustainable development index calculations because Rotmans et al.'s (2001; see Section 2.3.1) multilevel perspective is disregarded (see Section 3.3.2 and Section 3.3.3). Imputation of missing values is performed (see Section 4.3.3) to turn the key figures' incomplete data set into a complete one (van Buuren, 2012), reducing statistical biases (e.g. Little & Rubin, 2002) and ensuring the assessment principle methodological soundness (see step two). Imputation is deployed on key figures (i.e. the raw data) in order to prevent possible biases that would arise from afore-going calculations such as standardisation accomplished in the next step. Standardisation to sustainable development key indicators is realised with the complete sample of key figures (see Section 4.3.4). It accounts for different aggregational sizes of micro, meso, and macro objects of investigation and ensures the assessment principle multilevel object comparability (see Table 3.1; e.g. Hacking & Guthrie, 2008). Moreover, the key indicators are primarily in charge of the assessment principle compliance with framework (see Table 3.1; e.g. Pint´er et al., 2018). For instance, the key indicators critically determine the comprehensiveness (e.g. B¨ohringer & Jochem, 2007) and capability of multilevel application of an index because the key indicators are an index's data input. Both key figures and key indicators are variables in terms of input data at certain stages of an index. In this context, Waas et al.'s (2014; see Section 3.3) finding that a variable becomes an indicator when linked to a reference value is disregarded. In order to prevent misunderstandings, the term "variable" is only used in general contexts of a method's input data, and when referring to input data of a sustainable development index, "key figure" or "key indicator" is quoted, respectively. Furthermore, a methodologically sound index only contains relevant key indicators (see Table 3.1; Janouˇskov´a et al., 2018) and maps both efficiencies and effectivenesses of sustainable development performances (see Table 3.1; e.g. Figge & Hahn, 2004).

4.1. Overview of sustainable development indices' calculation steps and methodological requirements 65

**Figure 4.1** Calculation steps of a sustainable development index

In the fifth calculation step, outlier detection and treatment is conducted (see Section 4.3.5) to diminish statistical biases (Hadi, Rahmatullah Imon & Werner, 2009) and once more induce methodological soundness (see step two). Key indicators' instead of key figures' outliers are treated because outliers primarily impact scales, which are computed with the key indicators in the next step (see step six). For detection and treatment, a perspective of information loss should be adopted, and statistical bias should be balanced with distortion of the true picture (e.g. McGregor & Pouw, 2017; Zhou, Fan & Zhou, 2010). Scaling the key indicators (sixth step) harmonises the key indicators' diverse units (see Section 4.3.6). This step complies with the assessment principle indicator comparability (see Table 3.1; e.g. Pint´er et al., 2018) and methodological soundness (see step two) because scaling is essential for a meaningful aggregation to be realised in the eighth calculation step (see step eight; e.g. Ebert & Welsch, 2004). Because different types of scales contain distinct degrees of information, the chosen scaling procedure should minimise loss of information (e.g. Zhou et al., 2010). Moreover, scales should empower compliance with the assessment principles target and boundary orientedness (see Table 3.1; e.g. Sala et al., 2015) as well as effective communication (see Table 3.1; e.g. Pint´er et al., 2018). A further clarification of terminology is required: Both standardisation and scaling are concerned with transformation of different scales onto one common scale. "Normalisation" is a further synonym (Pollesch & Dale, 2016). To avoid misunderstandings between the fourth calculation step – standardisation of the key figures to the key indicators for multilevel object comparability (see Section 4.3.4) – and the sixth calculation step – scaling the key indicators for indicator comparability (see Section 4.3.6) – the terms "standardisation" and "scaling" are exclusively used for their respective purposes. The expression "normalisation" remains unused.

**Figure 4.2** Layers of an overall sustainable development index

The seventh calculation step accomplishes weighting of scaled key indicators (see Section 4.3.7). This step is essential for assessing relationships among the data (e.g. Greco, Ishizaka, Tasiou & Torrisi, 2019) and accounting for synergies and trade-offs (see Table 3.1; e.g. Costanza, Fioramonti & Kubiszewski, 2016). Thereby, it is the substantive step in closing the knowledge gap (see Section 2.3.3; e.g. Weitz et al., 2018). In doing so, methodological soundness in terms of objectivity (see Table 3.1; Sala et al., 2015) and relevance should be guaranteed (see Table 3.1; Janouˇskov´a et al., 2018). The eighth step performs aggregation (see Section 4.3.8). First, scaled and weighted key indicators are aggregated into sustainable development subindices of each contentual domain. Second, these are combined to an overall sustainable development index. Figure 4.2 visualises the layers of an overall sustainable development index. The implemented aggregation function moderates the degree of substitutability (Grabisch, Marichal, Mesiar & Pap, 2009) and is hence guided by the allowance of weak sustainability with minimised substitutability within the safe and just operating space for humanity (see Section 2.2.4). Furthermore, the aggregation function must interplay meaningfully with the underlying scales for methodological soundness (see step six; e.g. Ebert & Welsch, 2004) and also minimise loss of information (e.g. Zhou et al., 2010). Last, sensitivity analyses are carried out for calculation steps that provide alternatives (see Section 4.3.9). The aim is to ensure methodological soundness in terms of credibility, validity, reliability, and robustness (see Table 3.1; e.g. Cash et al., 2003; Janouˇskov´a et al., 2018; Sala et al., 2015) and enhance transparency (see Table 3.1; e.g. Pint´er et al., 2018). In case of the MLSDI sensitivities are tested for missing value imputation, outlier detection, and weighting. For the other calculation steps, the theoretical and methodological research points to one unique approach.

Methodological soundness is emphasised in individual calculation steps despite being effective in each step and the overall computation. Table 4.1 provides a summary of the


**Table 4.1** Assignment of the guiding assessment principles and further criteria to the calculation steps of a sustainable development index

assignment of the guiding assessment principles and further criteria to the calculation steps of a sustainable development index. Based on this assignment, methodological approaches of the nine identified sustainable development indices (see Section 3.3.2 and Section 3.3.3) are evaluated in the following section, Section 4.2. In contrast to the indices' evaluation in Section 3.3.2 and Section 3.3.3, methodological soundness and the assessment principles' connection to an index's major calculation steps – step six to step nine – are focused on.

### **4.2 Methodological evaluation of sustainable development indices**

The first evaluated index in Section 3.3.2 is the family of DJSI (e.g. S&P Dow Jones Indices, 2018, 2019). It has been concluded that the DJSI are not presented transparently. In this vein, data cleaning (missing value imputation and outlier treatment), sensitivity

**(a)** Dow Jones Sustainability Indices (DJSI) (e.g. S&P Dow Jones Indices, 2018)

**Figure 4.3** Evaluation of methodological soundness and linkage to assessment principles by meso-level indices of sustainable development

analyses, scaling, and aggregation are unknown. Full information on weighting is not provided, but it is announced that weights are floating and industry specific. Individual weight adjustment should be refrained from because it disables comparability (see Section 4.3.6.2; Nardo et al., 2008) and grants developers mediating power, setting low weights on underperforming elements (see Section 3.2; Jesinghaus, 2018). The evaluation of the DJSI's methodological soundness and major calculation steps' linkage to assessment principles is portrayed in Figure 4.3a.<sup>27</sup>

The other identified micro-level sustainable development index is the ICSD (Krajnc & Glaviˇc, 2005). It does not impute missing values, treat outliers, nor does it test sensitivities (see Figure 4.3b). Data cleaning might be superfluous because of the small sample size, but a holistic methodological approach prepares for occasions in which data cleaning becomes necessary (Nardo et al., 2008). Scaling is accomplished by ratio scaling with target setting. Key indicators are divided by company targets, implementing the assessment principle target and boundary orientedness. However, ratio scaling entails mathematical inconsistencies (see Section 4.3.6.2; Pollesch & Dale, 2016), and scores are difficult to interpret, such that effective communication is harmed. Weights are determined by the analytical hierarchy process, which involves critical subjectivities (see Section 4.3.7.1; Zhou, Ang & Poh, 2006). Arithmetic aggregation is applied, but this aggregation function is not compatible with the underlying scales, leading to meaningless results (see Section 4.3.8; e.g. Ebert & Welsch, 2004). Moreover, arithmetic aggregation implements weak sustainability but does not minimise substitutability (see Section 4.3.8; e.g. Pollesch & Dale, 2015).

Among the identified macro-level indices, the FEEM SI is the first index to be examined (e.g. Pinar et al., 2014). Missing values are not imputed, but outliers are treated with lower weights (see Figure 4.4a). Compared to a non-treatment, this procedure is progressive, but biases remain (see Section 4.3.5.2; R¨assler, Rubin & Zell,

<sup>27</sup>References and sources of the assessment principles are not repeated in this section but can be found in Section 3.1, Section 4.1, and Section 4.3.

2013). Policy targets are included in the scaling procedure, which is performed by rescaling. The data range on a discrete interval from zero to one. Rescaling yields easily understandable scores, encouraging effective communication. However, scales should be continuous to minimise information loss (see Section 4.3.6 and Section 4.3.7.4; e.g. Yang & Webb, 2009; Zhou et al., 2010). Weights are determined by experts' elicitation, and aggregation relies on the Choquet integral, which allows for preference-based index construction. Both experts' elicitation and Choquet integral do not follow the assessment principle objectivity. Notwithstanding, sensitivities of experts' preferences are tested.

The HSDI does not clean data, nor does it test sensitivities (see Figure 4.4b; e.g. Bravo, 2018). Equal weights are applied, ignoring correlations of indicators. Equally weighted correlated variables entail double counting of the correlated information, implicitly upgrading their weights (see Section 4.3.7.1; Greco et al., 2019; Nardo et al., 2008). Hence, equal weights are "universally considered to be wrong" (see Section 4.3.7.1; e.g. Chowdhury & Squire, 2006). Data are scaled between zero and one and aggregated geometrically. Geometric aggregation implements weak sustainability with minimised substitutability (see Section 4.3.8; e.g. Pollesch & Dale, 2015). However, it obtains overall zero results when combined with a lower rescaling bound of zero. In other words, substitutability vanishes, and thus, the lower bound should be raised (see Section 4.3.6.2 and Section 4.3.8; Saisana & Philippas, 2012).

The MISD comprises 31 indices (e.g. Shaker, 2018). Therefore, an overall methodological evaluation is not feasible. Concentrating on the MISD, it does not treat outliers despite recognising issues in computation (see Figure 4.4c). However, it overcomes other indices' methodological shortcomings in terms of missing value imputation: The MISD fills missing values by multiple imputation, reducing statistical biases (see Section 4.3.3; e.g. Little & Rubin, 2002) and accounting for uncertainties in the imputation process (see Section 4.3.3.3; e.g. Schafer & Graham, 2002). Furthermore, it determines weights by multivariate statistical analysis, which is generally the preferred field of methods (see Section 4.3.7.1; Mayer, 2008). However, factor analysis is not suitable for sustainable development index construction because it is a top-down approach (see Section 4.3.7.1; Haerdle & Simar, 2012). Similar to the HSDI, rescaling between zero and one is combined with geometric aggregation. Sensitivities are not investigated.

The SDGI does not treat missing values on purpose in order to draw attention to missing data. Although, few exceptions carried out cold deck or mean imputation (see Figure 4.4d; Schmidt-Traub et al., 2017b). Both methods do not fully eliminate statistical biases (see Section 4.3.3.2; R¨assler et al., 2013). The SDGI claims to follow Nardo et al.'s (2008)<sup>28</sup> recommendation "truncating the data by removing the bottom 2.5 percentile from the distribution" (Schmidt-Traub et al., 2017b). Replacing outliers

<sup>28</sup>Schmidt-Traub et al. (2017b) reference a 2016 publication. To the best of the author's knowledge, the here cited 2008 publication by Nardo et al. is the most recent one at the time of research.

**(a)** FEEM Sustainability Index (FEEM SI) (e.g. Pinar et al., 2014)

**(c)** Mega Index of Sustainable Development (MISD) (e.g. Shaker, 2018)

**(e)** Sustainable Development Index (SDI) (Bolc´arov´a & Koloˇsta, 2015)

**(b)** Human Sustainable Development Index (HSDI) (e.g. Bravo, 2018)

**(d)** Sustainable Development Goal Index (SDGI) (e.g. Schmidt-Traub et al., 2017b)

**(f)** Sustainable Society Index (SSI) (e.g. van de Kerk et al., 2014)


with thresholds is methodologically sound, but Nardo et al. (2008) advises to shorten the bottom and top of a distribution; one-sided treatment is not reasonable (see Section 4.3.5.2). Rescaling between zero and 100 is appropriate in the context of arithmetic aggregation. However, the arithmetic mean should be avoided and likewise should equal weights (see above). Sensitivities are tested for outlier thresholds and the aggregation function.

The SDI does not treat outliers, nor does it investigate sensitivities (see Figure 4.4e; Bolc´arov´a & Koloˇsta, 2015). It imputes missing values, but the chosen mean imputation still leads to invalid inferences (see Section 4.3.3.2; R¨assler et al., 2013). Sound weighting is executed by application of multivariate statistical analysis. In particular, the bottom-up Principal Component Analysis (PCA) is deployed (see Section 4.3.7.1 and Section 4.3.7.2; e.g. Mayer, 2008). Classical scaling and aggregation in PCA are z-scores (mean equal to zero and variance equal to one) and arithmetic aggregation. Both are retained in the SDI. Arithmetic aggregation does not fulfil the methodological criteria (see above). Z-scores are not favourable because they are difficult to interpret, and due to negative values, they cannot be combined with geometric aggregation (see Section 4.3.6.2; e.g. Field, 2009).

The SSI imputes missing values by expert judgement (e.g. van de Kerk et al., 2014). Compared to a non-imputation case, bias is reduced, but the assessment principle objectivity is violated (see Figure 4.4f). Outliers are identified with thresholds on skewness and kurtosis and treated by non-linear scale transformations. Both methods are not recommendable. First, skewness and kurtosis are not robust to outliers because outliers inflate these measures, such that outliers might not be detected as such (see Section 4.3.5.2; e.g. Aggarwal, 2017; Hadi et al., 2009). Second, non-linear transformation is particularly harmful in index calculation because it changes correlations between variables (see Section 4.3.5.2; Oh & Lee, 1994), while correlations should be investigated in statistical weighting procedures (see Section 4.3.7.1; e.g. Mayer, 2008). The SSI does not deploy statistical but top-down equal weighting. On the positive side, the non-linear transformations are not harmful because correlations are not investigated. On the other side, equal weights are not sufficient (see above). Furthermore, the justification of the SSI to implement equal weighting because "[t]here are no highly correlated indicators (all Pearson correlations coefficients are lower than 0.82)" (Saisana & Philippas, 2012) might be false: Correlation coefficients greater than 0.8 typically indicate very high correlations (Field, 2009). Apart from that, Pearson's coefficient might be inappropriate because it assumes normality (see Section 4.3.3.3; Field, 2009), which is not tested in the SSI. Nonetheless, sound scaling, sound aggregation, and sensitivity analyses are executed: Geometric aggregation is applied on the rescaled indicators, and sensitivities are tested for the weighting procedure. The rescaling range starts at one and ends at ten; substitutability is maintained throughout the entire range.

The WI partially deals with missing values, but the method remains unknown (see

Figure 4.4g; Prescott-Allen, 2001). Outliers are detected and replaced by respective threshold values. However, the detection is one-sided (at the top). Weighting is arbitrary, arithmetic aggregation is applied, and sensitivities are not tested. On the positive side, rescaling between zero and 100 is implemented.

In conclusion, previous sustainable development indices do not only lack compliance with the assessment principles (see Section 3.3.2 and Section 3.3.3) but fail to meet methodological and scientific requirements (see above; e.g. B¨ohringer & Jochem, 2007). This forms the fifth and last research gap. Major criticisms include non-comprehensive scope (das Neves Almeida, Cruz, Barata & Garc´ıa-S´anchez, 2017; Frugoli, Villas Bˆoas de Almeida, Agostinho, Giannetti & Huisingh, 2015; Singh et al., 2012); insufficient weighting, not addressing interconnections of indicators (i.e. knowledge gap; see Section 2.3.3; e.g. Weitz et al., 2018); meaningless aggregation; missing sensitivity analyses (B¨ohringer & Jochem, 2007; Singh et al., 2012); and statistical biases as a result of unsatisfactory data cleaning.

To overcome these conceptual and methodological shortcomings, the following section, Section 4.3, conducts profound methodological research on each calculation step of a sustainable development index. The MLSDI's methodology will be the result.

### **4.3 Methodology of the Multilevel Sustainable Development Index (MLSDI)**

This section addresses each calculation step of a sustainable development index in detail and derives the MLSDI. On that account, broad methodological research is carried out, and a variety of methods are reviewed to make profound decisions. This section's structure follows the nine calculation steps (see Figure 4.1).

#### **4.3.1 Collection of sustainable development key figures**

The first step in the calculation process is the collection of sustainable development key figures. These are inferred from the sustainable development key indicators, and further information will follow in Section 4.3.4. Decisive in the key figure collection process is data availability: Data must be available by official statistics. Official statistics are open access and hence easily acquired (Zuo, Hua, Dong & Hao, 2017), addressing the sustainability gap (see Section 2.3.4; e.g. Hall et al., 2017) and ensuring the assessment principle transparency (see Table 3.1; e.g. Pint´er et al., 2018).

The structure of the set of sustainable development key figures c<sup>5</sup> follows from the conceptual framework (see Chapter 2) and is formally denoted by:

$$c\_5 = c\_5(n, x, t, r),\tag{4.1}$$

**Figure 4.5** Structure of the sustainable development key figures' data set

where n[1, N] represents an economic object of the change agent group business of any aggregational size, x[1, X] portrays a sustainable development key figure, t[1, T] depicts a time period, and r[1, R] is a geographical region. The structure of the set of key figures c<sup>5</sup> is illustrated in Figure 4.5. Economic objects n are stored in rows, columns contain key figures x, tables represent time periods t, and geographical regions r constitute the fourth axis.

Neither society, policy, nor science are objects of investigation but participate in the transition to sustainability by, for instance, designing, performing, or drawing conclusions on the analysis (see Section 3.2). Moreover, as a consequence of the multilevel perspective, economic objects n are organised in an inclusive hierarchy: Multiple layers are nested within each other (Steenbergen & Jones, 2002), and higher ranked economic objects n contain lower ranked economic objects n. That is, macroeconomic objects n such as conglomerates of institutions or organisations comprise meso-economic objects n such as networks, communities, or organisations, and these in turn encompass micro-economic objects n such as individuals and individual actors (see Section 2.3.1; Rotmans et al., 2001). In contrast, in an exclusive hierarchy, objects that are ranked lower are not included in objects that are ranked higher (Gibson, Ostrom & Ahn, 2000). To avoid complex multilevel methods, which implicitly account for double counts arising from the inclusive hierarchy, the inclusive hierarchical multilevel data structure is eliminated before the MLSDI's modelling process. Section 5.1 will reveal that the industry level is maintained, while potential corporations, aggregated branches, or overall economies are eliminated. Bias from the elimination is not expected because of the inclusiveness. Not potential corporations at the meso level but industries at the

macro level are maintained because sustainable development is a macro-level concept (see Section 2.3.2; e.g. T. Hahn et al., 2015).

The following section, Section 4.3.2, describes the preparation of key figures x.

#### **4.3.2 Preparation of sustainable development key figures**

The key figures' preparation homogenises data formats to enable multilevel comparability and to accomplish the assessment principle methodological soundness in terms of credibility, validity, and reliability (see Table 3.1; e.g. Cash et al., 2003; Janouˇskov´a et al., 2018). With respect to multilevel comparability, meso-economic company data are transferred to macro-economic categories (see Section 4.3.2.1). A transfer from meso to macro and not vice versa is performed because economic objects n at the macro level (i.e. industries) are maintained (see Section 4.3.1). In Section 4.3.2.2, statistical classifications of macro-economic data are transformed because not all data are released in the same classification scheme. For both transformations, it is anticipated that Germany is the sample region r (see Section 5.1) and that data are acquired from the Statistical Office of the European Communities (Eurostat) and the Federal Bureau of Statistics (Destatis). The implemented transformation methods in this work are equivalent to the approaches by the statistical offices.

#### **4.3.2.1 Meso-level transformation to macro-economic categories**

Typically, corporations report revenues, costs, and profits, while the macro-economic Gross Value Added (GVA) is required for standardisation of the key figures x. This finding is derived in Section 4.3.4. To allow for the demanded standardisation, mesoeconomic data is transferred to the GVA, which "is a measure of the contribution to GDP made by an individual producer, industry or sector" (EC et al., 2009). It can be calculated in several ways. Computation via the gross and net output is shown in Table 4.2.<sup>29</sup> Another way of calculation is to first determine the intermediate consumption or input (marked with "†" in Table 4.2) and subsequently subtract it from the gross output. The output measures all goods and services produced and not used up by the same establishment, while the intermediate consumption or input comprises goods and services used up in the production process (EC et al., 2009). Further definitions can be found in Destatis (2019c); and EC et al. (2009).

#### **4.3.2.2 Macro-level transformation of statistical classifications**

This section deals with transformations of official statistical classifications. In the EU and hence in Germany, official macro-economic statistical data are released in Classification of Products by Activity (CPA) or Statistical Classification of Economic Activities in

<sup>29</sup>Publications in German from Destatis are utilised because, in contrast to methodological aspects, meso-economic data collection is decentralised in the European Union (EU).


#### = **Gross output**


#### = **Gross Value Added (GVA)**

**Table 4.2** Calculation of the Gross Value Added (GVA) with meso-economic data (Destatis, 2019c); †, intermediate consumption

the European Community (NACE) (Eurostat, 2008a, 2008b). The first classification scheme classifies products, and the latter groups industries, which typically produce more than one product. For the analysis of sustainable development performances by macro-economic objects n, both classifications are valid. However, because companies usually produce various products that belong to more than one CPA class, mesoeconomic corporate data are generally classified according to NACE. A company's NACE assignment is accomplished according to its main field of activity (Destatis, 2019c). Therefore, data classified according to NACE are prerequisites for multilevel comparability (see Section 4.3.4) that is methodologically sound.

Some official macro-economic statistical data are released in CPA, such that transformations from CPA to NACE are necessary. This is undertaken by methods deployed

in the calculation of input-output tables. Input-output tables are symmetric matrices that serve to present the process of production, use of goods and services, as well as the income generated (Eurostat, 2008a).<sup>30</sup> They are transformations of supply and use tables, and both contain products in CPA in their rows and industries in NACE in their columns. Transforming supply and use tables to input-output tables either yields industry-by-industry or product-by-product tables. Destatis computes product-by-product tables with the product technology assumption (Destatis, 2010a). This assumption states that "[e]ach product is produced in its own specific way, irrespective of the industry where it is produced" (Eurostat, 2008a). In the computation process, secondary products are relocated to industries, such that they become primary products. Primary products are products that are related to one industry by definition (Eurostat, 2008a). For input-output tables, these are diagonal elements, whereas secondary products are off-diagonal elements. The technology matrix M<sup>T</sup> realises the transformation of classifications and reads:

$$M\_T = \left( (I \cdot S)^{-1} \cdot S \right)^t,\tag{4.2}$$

where I is an identity matrix, and S depicts a symmetric supply table. Due to the transposition, the technology matrix M<sup>T</sup> contains industries in the rows and products in the columns. To complete the transformation, the technology matrix M<sup>T</sup> is multiplied with a sustainable development key figure in CPA xCPA, yielding the respective sustainable development key figure in NACE xNACE:

$$x^{NACE}(n,t,r) = M\_T \cdot x^{CPA}(n,t,r). \tag{4.3}$$

The set of sustainable development key figures in NACE cNACE <sup>5</sup> is represented by:

$$c\_5^{NACE} = c\_5^{NACE}(n, x^{NACE}, t, r). \tag{4.4}$$

For the remainder of this work, key figures in NACE xNACE are regarded but simply denoted by "x". Their set is also simply quoted by "c5".

On this data set, missing values are imputed as described in the following section, Section 4.3.3.

#### **4.3.3 Imputation of missing values**

Missing values or missing data are underlying but unobserved data (R¨assler et al., 2013). Assuming that missing values are meaningful for the modelling and analysis process,

<sup>30</sup>Eurostat's (2008a) *Manual of supply, use and input-output tables* was released under the European System of Accounts (ESA) 1995. The currently valid standard is ESA 2010 (Eurostat, 2013). An updated manual has not been released at the time of research. However, the utilised method is expected to remain valid without changes under the updated standard.

they cause a bias if they remain untreated: The observed data dominate the result (Little & Rubin, 2002). As missing data frequently occur in sustainable development quantification (e.g. Schmidt-Traub et al., 2017a), dealing with them is an essential step, contributing to the methodological soundness of an index in terms of credibility, validity, and reliability (see Table 3.1; e.g. Cash et al., 2003; Janouˇskov´a et al., 2018). Generally, there are four approaches to address missing values, converting the incomplete sample to a complete one (van Buuren, 2012): complete case analyses, weighting procedures, model-based procedures, and imputation-based procedures. Complete case analyses ignore objects with missing data, weighting procedures weight non-response objects less, model-based procedures specify a model with the observed data, and last, imputationbased procedures estimate missing values (Little & Rubin, 2002; R¨assler et al., 2013). Generally, only model-based and imputation-based procedures yield valid inferences (R¨assler et al., 2013). Imputation is chosen to handle missing values because it does not require modelling that is specific to the missing data; this would lead to a loss of generality in application.

This section is structured as follows. First, missing values are characterised (see Section 4.3.3.1). Second, two imputation methods are presented: The MLSDI's single imputation method is derived in Section 4.3.3.2, and its multiple imputation method follows in Section 4.3.3.3. Last, statistical tests of model assumptions are outlined in Section 4.3.3.4.

#### **4.3.3.1 Characterisation of missing values**

Three characteristics of missing values are crucial in determining suitable imputation methods: the missing data pattern, degree of missingness, and the missing data mechanism. The missing data pattern describes the structure of observed and unobserved data in the data set and can be, for instance, univariate, monotone, or general (Little & Rubin, 2002). General missingness is also referred to as non-monotone (van Buuren, 2012) or arbitrary (Schafer & Graham, 2002). Figure 4.6 visualises these patterns. Further patterns can be found in, e.g. Little and Rubin (2002).

The degree of missingness can be analysed according to unit non-response and item non-response. Unit non-response refers to objects that do not deliver any information. Item non-response regards an object's missingness of one or more variables. The rate of missing values λ is the ratio of unobserved to total data and indicates the severity of the missing data problem (R¨assler et al., 2013; van Buuren, 2012).

The relationship between observed and unobserved data is characterised by the missing data mechanism. The missing data mechanism can be classified into three types. First, if data are Missing Completely at Random (MCAR), missingness is independent of the observed as well as the unobserved data. Second, Missing at Random (MAR) implies that missingness is independent of the unobserved but depends on the observed

**Figure 4.6** Examples of missing data patterns (based on Little and Rubin, 2002; with friendly permission of c 2002 by John Wiley & Sons, Inc. All rights reserved)

data. In both cases, distributions of variables are unaffected by inclusion of the missing data, such that the same modelling process can be performed. The non-response is ignorable. Third, Missing Not at Random (MNAR) means that missingness depends on both the observed and the unobserved data, and distributions are influenced by the missingness. In this non-ignorable case, the model for the complete data differs from the incomplete data's model (Little & Rubin, 2002; R¨assler et al., 2013; Rubin, 1976; Schafer & Graham, 2002; van Buuren, 2012). Ignorability and MAR are typical in practice (Enders, 2010) and therefore assumed for the MLSDI, such that only MAR methods are researched.

#### **4.3.3.2 Single time series imputation: Various methods depending on the missing data pattern**

Generally, methods for missing value imputation can be divided into single and multiple imputation. Single imputation methods impute missing values only once, whereas multiple imputation methods are simulation techniques that compute several plausible values for the final fill (R¨assler et al., 2013). Single imputation does not account for uncertainties in the imputation process, but multiple imputation does (Little & Rubin, 2002). The MLSDI makes use of both single and multiple imputation methods. Single imputation methods are expected to yield valid results because the uncertainty of the imputation process is assumed to be relatively low: Either further data in the time series or higher aggregational economic objects n of the inclusive hierarchy (see Section 4.3.1) are observed (see Section 5.2.1). In order to confirm or reject the expectation of uncertainties having a relatively low effect on the MLSDI's imputation process, single imputation is tested against multiple imputation (see Section 4.3.3.3).

Single imputation methods comprise hot deck imputation, substitution, cold deck imputation, imputation by mean, and (stochastic) regression imputation. In hot deck imputation, data from similar objects serve to impute missing values. Substitution replaces blanks with objects that are not in the initial sample, and cold deck imputation fills missing values with data from external sources (Little & Rubin, 2002; Nardo et al., 2008). Mean imputation uses the sample mean for estimating missing values. In regression imputation, observed data represent independent variable(s) to predict missing, dependent variable(s) (Little & Rubin, 2002). Hot deck and regression imputation are single imputation methods that are capable of correctly reflecting variability of the imputation process (R¨assler et al., 2013) and are thus applied in the MLSDI.

Generally, a univariate time series point of view is adopted in the MLSDI's single imputation process for two reasons: First, key figures x show stable trends (see e.g. Figure 5.3), such that time periods t are expected to be reliable predictors; and second, each economic object n is assumed to feature distinct sustainable development characteristics with the result that cross sections are expected to be unreliable predictors. Kalman smoothing on a basic structural time series model fitted by the maximum likelihood method is the preferred single imputation method because it yields more stable results than further time series models such as Autoregressive Integrated Moving Average (ARIMA) models (Harvey, 1989; Kalman, 1960). Additionally, its application enables imputation of the first time period (Moritz, 2018).

A basic structural time series model regards an observation (i.e. a key figure x) as a permanent trend component μ, seasonal component γ, and an irregular random noise ε in time period t. The model is described by the following formula (Harvey, 1989):

$$x(t) = \mu(t) + \gamma(t) + \varepsilon(t). \tag{4.5}$$

On this model, the Kalman filter is applied. It is a recursive algorithm for estimating observations based on the available information (Harvey, 1989). The estimation is a maximum likelihood estimation, and parameters that maximise the likelihood function are searched for. The Kalman filter assumes normally distributed variables, stationarity (i.e. time invariant distributional properties) of data, and independent and identically distributed (i.i.d.) residuals (Greene, 2003; Harvey, 1989). However, Harvey (1989) asserts the Kalman filter to remain an optimal linear estimator that minimises the mean square error if the normality assumption is violated.

Kalman smoothing on a basic structural time series model is not applicable to any missing data pattern but requires a minimum of three observations in a time series. If there are only two observations, the Stineman algorithm is applied (Moritz, 2018). The Stineman algorithm features monotonical properties and thus gives smoother results as, for example, polynomial interpolations (Stineman, 1980). Once again, this property suits the key figures' stable trends (see e.g. Figure 5.3). If there is only one observation in the time series, this value is held constant, and a modified hot deck imputation is deployed: Data from the same economic object n but other time period t are imputed. If an economic objects' total time series is unobserved, the inclusive

hierarchy (see Section 4.3.1) is taken advantage of: Key figures x of higher aggregational economic objects n are always observed, and their key indicators y (see Section 4.3.4) are computed back to the missing lower aggregational key figures x. This is essentially equivalent to imputing higher aggregational industry means. R¨assler et al. (2013) do not approve mean imputation. However, the presented modified mean imputation is expected to obtain valid results because the inclusive hierarchy reduces uncertainty in the imputation process.

To summarise, the missing data pattern imposes limitations on the applicability of methods, and four single time series imputation techniques are implemented:


The MLSDI's multiple imputation method is determined in the following section, Section 4.3.3.3.

#### **4.3.3.3 Multiple panel data imputation: Amelia II algorithm**

Multiple imputation is a simulation technique that treats parameters as random rather than fixed. Thereby, multiple plausible results are rendered possible, and uncertainty of the imputation process is accounted for by adding random noise (R¨assler et al., 2013; Schafer & Graham, 2002; Schafer & Olsen, 1998). The imputation is accomplished by random draws from a posterior distribution (R¨assler et al., 2013; van Buuren, 2012). The multiple results are combined into one result; usually by the arithmetic mean (Schafer & Graham, 2002). The convergence of the algorithm to the posterior distribution depends on the rate of missing values λ and the number of imputations m. Rubin (1987) shows that the relative efficiency in convergence of an estimate η equals (Schafer & Graham, 2002; Schafer & Olsen, 1998):

$$
\eta = \left( 1 + \frac{\lambda}{m} \right)^{-1} \text{.}^{31} \tag{4.6}
$$

Equalising the number of imputations m to the percentage rate of missing values λ is recommended by van Buuren (2012). Furthermore, multiple imputation methods are well suited for any missing data pattern (Enders, 2010), and differentiations as in single imputation (see Section 4.3.3.2) are not required.

<sup>31</sup>Rubin's (1987) original formula in units of standard deviations has been adjusted.

Generally, two modelling types exist in the field of multiple imputation: joint modelling (e.g. Rubin, 1987; Schafer, 1997) and fully conditional specification (e.g. van Buuren, 2007; van Buuren, Brand, Groothuis-Oudshoorn & Rubin, 2006). Joint modelling fills missing data by drawing simultaneously from one joint multivariate distribution. In contrast, fully conditional specification imputes missing values Oneat-a-Time (OAT) on a series of univariate distributions that are directly specified by the modeller (Mistler & Enders, 2017; van Buuren, 2012). According to Hughes et al. (2014); Liu, Gelman, Hill, Su and Kropko (2014); and Mistler and Enders (2017), joint modelling and fully conditional specification are equivalent under single level multivariate normal data. In contrast, van Buuren (2012) emphasises better theoretical properties of joint modelling and advises to prefer this modelling type if the data fulfil the modelling assumptions and if flexibility of individual specification is not demanded. In addition to van Buuren's (2012) argument, joint modelling is preferable for the MLSDI because multiple panel data imputation is aimed to be tested against single time series imputation. A joint multivariate distribution is therefore favoured over a series of univariate distributions.

Several software packages for multiple imputation by joint modelling exist, and overviews can be found in, e.g. Mistler and Enders (2017); and Yucel (2011). Amelia II is applied for multiply imputing the MLSDI's missing data (Honaker, King & Blackwell, 2018). It is the most promising software application in multiple imputation for four reasons: First, it is the only application that uses an expectation maximisation with bootstrapping algorithm (see below), second, several prior information can be included, third, simulation studies provide evidence that Amelia II outperforms other programmes such as NORM (Blankers, Koeter & Schippers, 2010; Novo & Schafer, 2015; Schafer, 1997), and fourth, its developers claim it to work well under violation of the normality assumption (Honaker, King & Blackwell, 2011). Non-normal data are likely in index construction, given the numerous key figures x and key indicators y to include for comprehensiveness of the index (see Table 3.1 and Section 4.3.4; e.g. Hacking & Guthrie, 2008). However, the last argument should be carefully considered. Demirtas, Freels and Yucel (2008) show that a violation of normality in multiple imputation produces biased results in their small sample of size 40. The results are only not distorted for their large sample of size 400, even with high rates of missing values λ such as 75%.

Amelia II works in three steps: bootstrapping, expectation maximisation, and imputation. These are repeated m times. Bootstrapping is a random sampling technique that is faster, more flexible, and easier to use than other techniques such as Markov chain Monte Carlo approaches (Blankers et al., 2010; Honaker et al., 2011).<sup>32</sup> An expectation maximisation algorithm is a framework for maximum likelihood estimation and estimates parameters of a predictive distribution function (Han, Kamber & Pei,

<sup>32</sup>More information on bootstrapping can be found in, e.g. Davison and Hinkley (1998); Efron and Tibshirani (1993); and G. James et al. (2013).

2012).<sup>33</sup> Last, missing values are imputed by drawing from the bootstrapped parameters. Given the m repetitions, m imputed data sets are at hand and combined into one result (Honaker et al., 2011, 2018).

For the MLSDI, Amelia II's panel data model is applied on the set of key figures c5. As many key figures x as possible are included in the model: Complete key figures x are generally incorporated, and only highly correlated key figures x are excluded (Honaker et al., 2011; R¨assler et al., 2013). The correlation analysis can be based on three different correlation coefficients: Pearson's coefficient, Spearman's rho, or Kendall's tau. Pearson's coefficient assumes normally distributed data, while Spearman's rho and Kendall's tau are non-parametric statistics without distributional assumptions (Field, 2009). Normality of key figures x is tested (see Section 4.3.3.4 and Section 5.2.2) to determine the adequate coefficient. Should the data be normal, Pearson's coefficient is chosen. Otherwise, Kendall's tau is calculated because it features better statistical properties than Spearman's rho despite being less popular. The threshold for being highly correlated is set to 0.8 (Field, 2009), boundaries on estimates are equalised to the observed range of values, time effects are specified to be linear and constant across time series and cross sections, the number of imputations m is levelled to the percentage rate of missing values λ, and last, the arithmetic mean is applied to combine the results (see above; Schafer & Graham, 2002; van Buuren, 2012).

In the following section, Section 4.3.3.4, tests for the underlying assumptions of both single and multiple imputation are outlined.

#### **4.3.3.4 Statistical tests of model assumptions**

The first assumption to be tested of both single and multiple imputation is the MAR assumption (see Section 4.3.3.1). However, MCAR is the only testable missing data mechanism as the required information for a MAR or MNAR test is missing (Enders, 2010; van Buuren, 2007). Enders (2010) predicate the impossibility of MAR and MNAR tests to be an important problem in practice. In contrast, Collins, Schafer and Kam (2001); R¨assler et al. (2013); and Schafer and Graham (2002) assert minor effects and valid inferences as a result of violating assumptions on missing data mechanisms. Furthermore, R¨assler et al. (2013) recommend MAR methods in any case because they facilitate the modelling and analysis process while still reducing biases compared to non-treatment. For the MLSDI, Little's (1988) MCAR test is performed because a confirmation of MCAR implies approving MAR. The MCAR test is a multivariate extension of the t-test, evaluating mean differences across subgroups. Under the null hypothesis, data are MCAR: The missing data patterns share a common mean, and the test statistic is approximately χ<sup>2</sup> distributed (Beaujean, 2015; Enders, 2010). The null hypothesis is desired to be accepted, and large p-values, which represent standard

<sup>33</sup>More information on expectation maximisation algorithms can be found in, e.g. Han et al. (2012); and McLachlan and Krishnan (1997).

normal probabilities, are demanded. Statistical significance is chosen to occur above p-values of 0.05. However, the test suffers from low power, and its usefulness is therefore limited (Enders, 2010).<sup>34</sup>

Regarding single time series imputation, the three assumptions of the basic structural time series model – normality, stationarity, and i.i.d. – are tested. The Shapiro-Wilk and Kolmogorov-Smirnov tests serve to investigate normality (Conover, 1980; CRAN, 2019; Royston, 1982; Shapiro & Wilk, 1965), stationarity is examined by the augmented Dickey-Fuller test (Dickey & Fuller, 1979, 1981; Trapletti, Hornik & LeBaron, 2018), and the Ljung-Box test is implemented to control for independence of residuals (CRAN, 2019; Ljung & Box, 1978). The Shapiro-Wilk and Kolmogorov–Smirnov tests are nonparametric tests that compare variance scores and distribution functions of the sample to a normal distribution, respectively. Under the null hypothesis, the data are normally distributed. The null hypothesis is desired to be accepted with p-values larger than 0.05. Tests are performed for every time period t because time is an implicit variable. For conciseness, the test results are compiled into one result by the arithmetic mean. In large samples, both tests suffer from type I error (rejection of a true null hypothesis), and thus, visualisation of the data by, for example, histograms should accompany the tests (Field, 2009). The augmented Dickey-Fuller test is a likelihood ratio test, and its null hypothesis states that data are generated by a unit root. That is, data are non-stationary (Dickey & Fuller, 1979, 1981). Consequently, the null hypothesis is desired to be rejected with p-values smaller than 0.05 (Greene, 2003). Last, under the null hypothesis of the Ljung-Box test, residuals are i.i.d. The null hypothesis is desired to be accepted with p-values larger than 0.05 (Brockwell & Davis, 2016; Ljung & Box, 1978). Both the augmented Dickey-Fuller and Ljung-Box tests refer to the temporal dimension, and the tests are carried out once for the total time series; compiling test results is not required.

In the case of the Amelia II algorithm, joint multivariate normality is tested with the multivariate Shapiro-Wilk test (Jarek, 2015). Convergence of the algorithm is investigated with overdispersed start values. Amelia II functions correctly if its convergence is independent of the diverse start values (Honaker et al., 2011).

As missing values are not allowed in the aforementioned tests (CRAN, 2019; Jarek, 2015; Trapletti et al., 2018), they are performed after the imputation process. Circular effects might be present, but these are assumed to be low, such that robust tests results are obtained.

<sup>34</sup>Details on shortcomings of this test can be found in Enders (2010).

#### **4.3.4 Standardisation to sustainable development key indicators**

Standardisation is the transformation of different scales into one common scale and is generally a univariate problem. It is also referred to as scaling or normalisation (Pollesch & Dale, 2016). In this fourth calculation step, the key figures x are standardised to the sustainable development key indicators y. When regarding this type of transformation, the term "standardisation" is exclusively used.

To implement the multilevel perspective (see Section 2.3.1; Rotmans et al., 2001), only key indicators y that are applicable at micro, meso, and macro levels are admitted to the MLSDI. Object comparability (see Table 3.1; e.g. Hacking & Guthrie, 2008) of micro, meso, and macro objects is ensured by the standardisation. Moreover, key indicators y define "the whole issue" (Moldan, Janouˇskov´a & H´ak, 2012; Pollesch & Dale, 2016) and critically determine the comprehensiveness (B¨ohringer & Jochem, 2007; Custance & Hillier, 1998; Zuo et al., 2017) and quality (Amor-Esteban et al., 2018) of an index. Therefore, the key indicators y must be connected to the definition of sustainable development (B¨ohringer & Jochem, 2007; Pezzey, 1992). Only then, information about sustainable development is captured appropriately, pertinently, and correctly (Amor-Esteban et al., 2018; Janouˇskov´a et al., 2018). In conclusion, key indicators are responsible for assuring the assessment principles compliance with a framework (see Table 3.1; e.g. Hacking & Guthrie, 2008; Pint´er et al., 2018) and relevance (see Table 3.1; Janouˇskov´a et al., 2018).

In Section 2.1, various definitions of sustainable development have been discussed, and in Section 2.2, each contentual domain has been defined. These definitions now serve to define environmental, social, and economic key indicators: Environmental key indicators are data that reflect harm induced by mankind or degradation of the natural world, social key indicators are defined as data that indicate a just satisfaction of human needs, and last, economic key indicators are data that allude to material and financial success required for environmental protection and social development.

At the macro level, 234 SDG indicators (see Section 2.3.3; UN, 2018, 2019b) are relevant, as the UN has released the most elaborated concept of sustainable development (see Section 2.1; Lock & Seele, 2017). At the meso level, the GRI disclosures (see Section 3.3.1; GRI, 2016) are most pertinent because GRI is the most widely used standard for corporate reporting on sustainable development (see Section 3.3.1; KPMG, 2017). The economic domain's disclosures are supported by the International Accounting Standards (IAS) and the International Financial Reporting Standards (IFRS) (IASB, 2018) because the GRI and the SDG frameworks lack several economic disclosures, presumably to avoid repetitions with the IAS and the IFRS. Micro frameworks could not be identified, such that embracement of multiple perspectives is currently limited to the meso and the macro levels. The intersection of the meso GRI and the macro

SDG frameworks determines the ideal set of sustainable development key indicators c4, which is formally represented by:

$$c\_4 = c\_4(n, y, t, r),\tag{4.7}$$

where y[1, Y ]. The alignment of the frameworks is based on GRI and UNGC (2018a). From the ideal set of key indicators c4, the ideal set of key figures c<sup>5</sup> is inferred (see Section 4.3.1). By aligning the GRI and the SDG frameworks, the criticism of the GRI framework following the business case of sustainability (see Section 3.3.1; Landrum & Ohsowski, 2018) is implicitly handled because the SDGs follow societal instrumental finality and paradox teleological integration by definition (see Section 2.3.3).

Furthermore, the set of key indicators c<sup>4</sup> is required to fulfil the assessment principle efficiency and effectiveness (see Table 3.1; Figge & Hahn, 2004). Therefore, two types of indicators – efficiency and effectiveness indicators – build the MLSDI. Efficiency indicators were initially developed in the environmental domain (Schaltegger & Sturm, 1989) and termed "eco-efficiency indicators". Maxime, Marcotte and Arcand (2006); and Verfaillie and Bidwell (2000) define an eco-efficiency indicator as the ratio of the production value and corresponding environmental influence. The production value quantifies the volume of produced products in physical or monetary units.<sup>35</sup> The environmental influence measures the effect on the environment arising from the production. Hence, eco-efficiency indicators capture the relationship of economic growth and environmental degradation. Their decoupling is desired, but their causal relationship is ambiguous (see Section 2.2.3). The eco-efficiency concept can be transferred to the social and economic domain, with the general indicator label efficiency indicator. Efficiency indicators are also referred to as productivity indicators (e.g. Eurostat, 2018; Huppes & Ishikawa, 2005; UN, 2018), whereas their reciprocal yields intensity indicators (Huppes & Ishikawa, 2005; Maxime et al., 2006; Verfaillie & Bidwell, 2000).<sup>36</sup>

Efficiency indicators' components – their specific metrics and reporting units – are controversially discussed, and diverse recommendations are given. Examples include standardisation by units of products, production volume in physical units (GRI, 2016; Maxime et al., 2006; Schneider et al., 2011; Verfaillie & Bidwell, 2000), revenues in monetary units, or sales in monetary units (GRI, 2016). Despite a preference for units of products or production volume in physical units in the literature, these standardisations metrics are disadvantageous as they harm comparability. "Apples and oranges" cannot be compared meaningfully neither can one kilogram of "apples" and one kilogram of

<sup>35</sup>In a macro-economic context, the production value is the value that quantifies all activities of an establishment. It comprises the production of goods and provision of services to another unit of the same establishment. In constrast, the output only includes production, disregarding internal provisions, and should thus be the generally preferred measure (see Section 4.3.2.1; EC et al., 2009).

<sup>36</sup>Huppes and Ishikawa (2005) further classify measures on environmental improvements such as environmental cost-effectiveness as eco-efficiency measures. However, as they regard effectiveness, they are classified as effectiveness indicators in this work (see below).

"oranges". Cubas-D´ıaz and Mart´ınez Sedano's (2018) statement that benchmarks are only meaningful across companies of the same industry applies. To enable meaningful multilevel object comparability, the standardising measure should be stated in monetary units. However, revenues and sales as recommended by GRI (2016) are inexpedient. First, costs are not but should be deducted because they include goods and services used up in the production process (see Section 4.3.2.1; EC et al., 2009). Second, revenues and sales are not but should be comparable to the macro level (see Section 4.3.2). GVA overcomes both shortcomings: It does not include intermediate consumption, and it links the meso and the macro levels because it measures an economic object's contribution to GDP (see Section 4.3.2.1; EC et al., 2009). Furthermore, recall that the GDP quantifies the size of an economy in terms of monetary market value (see Section 2.2.3; e.g. van den Bergh, 2009), and therefore, GVA as a standardisation measure exactly meets its purpose. The GVA and, respectively, the GDP approach is also used by, e.g. Eurostat (2018); and UN (2018).

Moreover, reporting units of the environmental domain are controversially discussed. Assessment methods may involve transformation of physical to monetary units (see Section 3.2). However, it was already pointed out in the late 1990s that market prices should not be assigned to ecosystem services. Monetary-based approaches mislead and distort the analysis, irrespective of the assignment mechanism. Several reasons are demonstrated: Biophysical properties are endogenous qualities that are independent of current prices, and thus, prices cannot reflect biophysical scarcity; nature's goods and services are rather complements than substitutes; future biophysical goods and services cannot be discounted as money can; and last, money can grow but nature cannot (Prescott-Allen, 2001; Rees & Wackernagel, 1999; Wackernagel & Rees, 1996). Additionally, empirical studies demonstrate the difficulty in monetisation of environmental impacts: Wide value ranges result, and clear pricing cannot be achieved (e.g. Antheaume, 2004; Epstein et al., 2011). In conclusion, transfers to monetary units should be refrained from, and units ought to be retained according to their domains: physical units in the environmental domain and monetary units in the economic domain. Efficiency indicators may feature mixed units.

Some scholars regard efficiency indicators as valuable tools and improved measures because they link sustainable development influences and economic performances, facilitating management and decision making (e.g. Charmondusit, Phatarachaisakul & Prasertpong, 2014; Gusm˜ao Caiado, de Freitas Dias, Veiga Mattos, Gon¸calves Quelhas & Leal Filho, 2017; Maxime et al., 2006; M¨uller, Holmes, Deurer & Clothier, 2015; Uhlman & Saling, 2010). Spangenberg (2015) even argues that data on sustainable development influences are meaningless if not put in relation to their generating activity.

Nonetheless, efficiency indicators require caution. For instance, eco-efficiency indicators reflect trade-offs between the environmental and the economic domains (Carvalho, Govindan, Azevedo & Cruz-Machado, 2017; Gusm˜ao Caiado et al., 2017). However, tensions of oppositional sustainable development elements. Therefore, efficiency indicators need to be coupled with further indicators (Gusm˜ao Caiado et al., 2017; B. Zhang, Bi, Fan, Yuan & Ge, 2008), contradicting Spangenberg's (2015) autocracy on efficiency indicators. Figge and Hahn (2004) suggest absolute measures to accompany relative measures (see Table 3.1). Managing only relative decoupling is not sufficient, but absolute decoupling should be overseen additionally. The inclusion of absolute measures would sacrifice comparability across economic objects n. If economic sizes are unknown, "apples" are compared to "oranges" (see above). Growth rates assist to circumvent this inherent trade-off between comparability and inclusion of absolute measures. Growth rates indicate percentage changes to a prior time period and are thus relative measures that capture effectiveness. As sustainability is a long-term goal (see Section 2.1; Dragicevic, 2018), long-term growth rates are the effectiveness indicators of the MLSDI.

In conclusion, the MLSDI deploys three approaches to compute the set of key indicators c4: First, key figures x are standardised by an economic object's size in terms of GVA, second, key figures x are standardised by another reference, and third, key figures x are expressed in growth rates from the first period (t = 1) to the last period (t = T) of the time horizon. Clearly, the first type is an intensity indicator referring to efficiency, while the latter reflects effectiveness. Intensity indicators instead of productivity indicators are computed, given their popularity (e.g. Eurostat, 2018; UN, 2018).<sup>37</sup> According to the definition of, e.g. Maxime et al. (2006), the MLSDI's second type of key indicators y does not depict intensity indicators because the reference is rather a total of the respective sustainable development influence (e.g. share of marginally-employed employees; Table 5.6). However, this type of key indicator y may be regarded as an intensity indicator in a broader sense because the calculation scheme is identical. The MLSDI adopts the broader view and a sustainable development ratio indicator yr, referring to efficiency, reads:

$$y\_r(n,t,r) = \frac{x(n,t,r)}{x\_{std}(n,t,r)},\tag{4.8}$$

where <sup>y</sup>r[1, Yr], <sup>x</sup>std[1, X] portrays a standardising key figure with <sup>x</sup>std <sup>=</sup> <sup>x</sup>. A sustainable development growth indicator yg, reflecting effectiveness, is calculated by:

$$y\_g(n, r) = \frac{x(n, t = T, r) - x(n, t = 1, r)}{x(n, t = 1, r)},\tag{4.9}$$

where yg[1, Yg].

At this point, the effective direction ξ of a key indicator y can be positive or

<sup>37</sup>However, several indicators will be changed to productivity indicators later on (see Table 5.10).

negative. Key indicators y with a positive effective direction ξ<sup>+</sup> increase their sustainable development performance with an increasing score, whereas key indicators y with a negative effective direction ξ<sup>−</sup> decrease their sustainable development performance with an increasing score (Krajnc & Glaviˇc, 2005). Harmonisation of the key indicators' effective directions ξ is accomplished during the scaling process (see Section 4.3.6). Previous to that, outliers are detected and treated in the next section, Section 4.3.5.

#### **4.3.5 Outlier detection and treatment**

An outlying observation, outlier, or anomaly is defined as a data point that deviates significantly from other members of the sample (Barnett & Lewis, 1994; Grubbs, 1969; Han et al., 2012). Assuming that at least 50% of the data set is homogeneous, outliers represent the minority (Hadi et al., 2009), not fitting the normal pattern (Aggarwal, 2017; Barnett & Lewis, 1994). Outliers need to be detected and treated because statistical analyses customarily assume homogeneous data (Hadi et al., 2009). Otherwise, the assessment principle methodological soundness (credibility, validity, and reliability) would be violated (see Table 3.1; Cash et al., 2003; Janouˇskov´a et al., 2018). In index construction, the scaling process especially suffers from outliers (see Section 4.3.6; Nardo et al., 2008) because outliers are extreme values (Barnett & Lewis, 1994), setting a scale's limits. The weighting process is indirectly affected via scales (see Section 4.3.7).

In outlier detection, data points with significantly diverging behaviour are identified (Han et al., 2012). The outlier rate β is the ratio of outlying to total data and alludes to the severity of the outlier problem. Outlier treatment regards the handling process. Criticism on "overidentifying" outliers is expressed by, e.g. McGregor and Pouw (2017). Outlier treatment distorts the true picture of data by ignoring the minority of cases and focusing on average behaviour. Information loss as expressed for aggregation by Zhou et al. (2010) is caused. Therefore, when determining the MLSDI's outlier detection and treatment method, the trade-off between statistical distortion and distortion of the true picture is taken into account to balance statistical bias and information loss. Furthermore, temporal comparability and progress analysis (see Section 4.3.6.1; Nardo et al., 2008) should be enabled. Outlier handling should thus – similar to scales and weights (see Section 4.3.6 and Section 4.3.7) – be time invariant. With respect to geographical regions r, variability is suggested by Nilsson et al. (2016), such that countries can interpret progress in sustainable development according to their national circumstances. This approach disables country comparison and should be abandoned if the goal is to conduct multinational analyses.

In the following, outliers are characterised (see Section 4.3.5.1), and the MLSDI's detection and treatment method is established (see Section 4.3.5.2).

**Figure 4.7** Spectrum from normal data to strong outliers (based on Aggarwal, 2017; with friendly permission of c Springer International Publishing AG 2017)

#### **4.3.5.1 Characterisation of outliers**

Similar to missing values, outliers can be characterised according to their pattern, degree, and mechanism (see Section 4.3.3.1).<sup>38</sup> Regarding the pattern, an outlier can be, among others, global or local. Global outliers deviate significantly from the entire sample, whereas local outliers differ from the local area (Han et al., 2012). The degree of outlyingness may be weak or strong. Borders are fluid, and the spectrum from normal data over weak outliers to strong outliers is illustrated in a simple flow diagram in Figure 4.7. The underlying mechanism that generates outliers can be classified into three types. First, outliers may exist because of a measurement error in the data generation process. Second, an error in the data collection might have occurred, also known as an execution error. Third, inherent variability, which is a natural variation in the population, may cause anomalies in the data (Barnett & Lewis, 1994).

For the MLSDI, outliers are assumed to be present due to inherent variability. In this case, overidentifying outliers and distortion of the true picture (see Section 4.3.5; McGregor & Pouw, 2017) causes information loss. Therefore, only global, strong outliers are aimed to be identified and treated. The following section, Section 4.3.5.2, determines outlier detection and treatment methods that satisfy this setting.

#### **4.3.5.2 Univariate Interquartile Range (IQR) method**

Simple univariate outlier detection methods establish outlier thresholds based on a combination of single measures. Examples include the mean and standard deviation, median and median absolute deviation, and skewness and kurtosis. Recommended thresholds for these measures can be found in, e.g. Aggarwal (2017); and Field (2009). Mean and standard deviation are sensitive to outliers. Outliers inflate these measures such that they suffer from masking (Field, 2009). Masking occurs when an outlier is not detected as such (Hadi et al., 2009). Skewness and kurtosis also suffer from masking because they are based on the mean and the standard deviation (Field, 2009). The median and the median absolute deviation remain robust measures in simple outlier detection (Leys, Ley, Klein, Bernard & Licata, 2013).

More advanced multivariate outlier detection models include statistical methods, proximity-based methods, or clustering-based methods. In statistical methods, observations that deviate significantly from the assumed distribution are outliers. Proximity-

<sup>38</sup>In contrast to the academic literature on missing values (see Section 4.3.3.1; e.g. Little & Rubin, 2002) the literature on outliers (e.g. Aggarwal, 2017) does not explicitly use these terms.

based methods detect outliers based on proximity measures from a data point to its neighbours. Last, clustering-based methods declare data points as outliers that belong to a small or no cluster (Han et al., 2012). Each method has advantages and disadvantages, and details can be found in, e.g. Aggarwal (2017); and Han et al. (2012). Simulation studies suggest preferring proximity-based over clustering-based methods (e.g. Aggarwal & Sathe, 2015, 2017; Goldstein & Uchida, 2016), and generally, simple intuitive models are likely to yield better results than highly complex models (Aggarwal, 2017).

For the MLSDI, outliers are detected by univariate methods because the primary goal of outlier detection is the reduction of scale distortion (see Section 4.3.5), and scaling is a univariate task (see Section 4.3.6.1). Two robust univariate outlier detection methods that are based on the median and the median absolute deviation are present. First, the Interquartile Range (IQR) method classifies an observation as outlying if it surpasses or falls below the outlier thresholds θ. These are defined by:

$$\left\{ \begin{array}{l} \theta\_{\max}(y,r) = Q\_3(y,r) + \alpha \cdot q(y,r) \\\\ \theta\_{\min}(y,r) = Q\_1(y,r) - \alpha \cdot q(y,r) \end{array} \right\},\tag{4.10}$$

where θmax is the upper threshold, θmin represents the lower threshold, α portrays the outlier coefficient, Q<sup>3</sup> is the 75th percentile, Q<sup>1</sup> depicts the 25th percentile, and q measures the IQR. The outlier coefficient α is typically set equal to 1.5. The 75th percentile is also called the third or upper quartile and cuts off the highest 25% of the data. Accordingly, the 25th percentile is also referred to as the first or lower quartile and truncates the lowest 25% of the data (Aggarwal, 2017; Han et al., 2012). Last, the IQR q is described by:

$$q(y, r) = Q\_3(y, r) - Q\_1(y, r). \tag{4.11}$$

The second method that is based on the median and the median absolute deviation is suggested by Leys et al. (2013). The 75th and the 25th percentiles Q<sup>3</sup> and Q<sup>1</sup> of Equation (4.10) are replaced by the median, and the IQR q is substituted by the median absolute deviation. The outlier coefficient α is recommended to be set equal to 2.5. Both methods are essentially the same because they are based on deviations from the median. As the IQR method is more widely spread and used in, for example, boxplots (see Figure 5.7b and Figure 5.8b; e.g. Han et al., 2012), the IQR method is applied in the MLSDI, with the typical coefficient α set equal to 1.5.

After outlier detection, outlier treatment is the next step. It can be conducted in four ways: Outliers may be removed and ignored; data may be transformed, such that outliers do not occur; outliers may be weighted less; or the score of the outlying observation may be changed (Field, 2009). Analogous to addressing missing values, removing and weighting are procedures that yield invalid inferences (see Section 4.3.3; R¨assler et al., 2013). Furthermore, transformations are not recommended in index calculation. First, a transformation is a form of scaling (Pollesch & Dale, 2016), and clarity may be forfeited if it is performed in addition to scaling for indicator comparability (see Section 4.3.6). Second, particularly non-linear transformations are harmful in index calculation because they impact correlations (Oh & Lee, 1994), while the determination of the key indicators' weights is based on correlation analysis (see Section 4.3.7). In conclusion, an outlying sustainable development key indicator y<sup>o</sup> is treated by changing its score to the thresholds θ:

$$y(n,t,r) = y\_o(n,t,r) = \begin{cases} \theta\_{max}(y,r), & \text{if } y(n,t,r) > \theta\_{max}(y,r) \\ \theta\_{min}(y,r), & \text{if } y(n,t,r) < \theta\_{min}(y,r) \end{cases},\tag{4.12}$$

where yo[1, Yo].

The MLSDI's outlier detection and treatment cannot be tested because it is an unsupervised problem setting. The true outlyingness is unknown and impossible to learn (Aggarwal, 2017).

#### **4.3.6 Scaling**

By definition, a variety of key indicators y are reported in index calculation. These typically feature diverse units (Pollesch & Dale, 2016), such that the required cross indicator comparability (see Table 3.1; e.g. Pint´er et al., 2018) and meaningful aggregation (see Section 4.3.8; e.g. Ebert & Welsch, 2004) for methodological soundness in terms of credibility, validity, and reliability (see Table 3.1; e.g. Cash et al., 2003; Janouˇskov´a et al., 2018) is not guaranteed. To ensure achievement of these principles, key indicators y are scaled. As stated in Section 4.3.4, scaling is a univariate problem and refers to the transformation of diverse scales into one common scale (Pollesch & Dale, 2016). The denotation "scaling" is exclusively used for the present calculation step of unifying key indicators' scales. Scales are time invariant but may vary over geographical regions (see Section 4.3.5; Nardo et al., 2008; Nilsson et al., 2016).

Non-internal scaling depends on additional exogenous data (Pollesch & Dale, 2016) and should be deployed in sustainable development indices to incorporate targets and boundaries, enabling the assessment principle target and boundary orientedness (see Table 3.1; Sala et al., 2015). Resulting scores from this type of scaling can then be interpreted as distance to target (Moldan et al., 2012; Pollesch & Dale, 2016). The scaling procedure should also minimise information loss (Zhou et al., 2006; Zhou et al., 2010), and resulting scales should be easily understandable to effectively communicate an index's results, attracting a broad audience (see Table 3.1; e.g. Pint´er et al., 2018).

In the following, scales are characterised in Section 4.3.6.1, and the MLSDI's scaling procedure is derived and described in Section 4.3.6.2.

#### **4.3.6.1 Characterisation of scales**

To fully understand a sustainable development index's scaling problem, several definitions are introduced. Subsequently, scales of a sustainable development index are characterised. A scale is the dimension (e.g. temporal, spatial, or analytical) used to measure a phenomenon. Its extent forms the overall size or magnitude, and its resolution regards the precision (Gibson et al., 2000; Rotmans, 2002). An absolute scale is objectively calibrated, whereas a relative scale is a transformation of the former to picture relationships of objects to each other (Gibson et al., 2000; Turner, Dale & Gardner, 1989). A scale's type can be nominal, ordinal, interval, or ratio. A nominal scale assigns labels; an ordinal scale results from rank ordering (Pollesch & Dale, 2016; Stevens, 1946); an interval scale preserves constant distances between values, and zero does not indicate absence of a variable; and last, a ratio scale is characterised by a natural fixed origin, with a vanished variable at zero. The type of scale determines the form of a variable's comparability. A nominal variable's equality may be ascertained, an ordinal variable's ordinal position may be determined, an interval variable's absolute differences may be evaluated, and a relative distance of a ratio variable may be assessed (Ebert & Welsch, 2004; Pollesch & Dale, 2016; Stevens, 1946).

A sustainable development index's scales correspond to the conceptual framework's dimensions (see Figure 2.11). Table 4.3 reports the technical dimension, extent, resolution, hierarchy, relation, and type of each conceptual dimension that is captured in the MLSDI. The temporal horizon contains yearly reported time periods t and is an absolute interval scale (Gibson et al., 2000; Stevens, 1946). The contentual domain is an analytical scale (Gibson et al., 2000) and is composed of key indicators y with diverse units on relative ratio scales (see Section 4.3.4; Pollesch & Dale, 2015). Geographical regions r are recorded in countries on an absolute nominal scale.<sup>39</sup> The change agent group business is a quantitative dimension (Gibson et al., 2000) with economic objects n on an absolute nominal scale. Hierarchical ordering of economic objects n occurs when incorporating the multilevel perspective (see Section 2.3.1; Rotmans et al., 2001) and the dimension aggregational size. The aggregational size is a functional dimension (Rotmans, 2002), which is organised in an inclusive hierarchy (see Section 4.3.1; Gibson et al., 2000) and features a trivariate resolution (micro, meso, and macro). The decisional tier is also classified as a functional dimension, with the trivariate options operational, strategic, and normative (see Section 2.3.2; e.g. Baumgartner, 2014). All scales except the functional scales are captured in the MLSDI's data structure (see Figure 4.5). The aggregational size is included in the economic objects n, and the decisional tier is addressed before and after the calculation in conceptualisation and decision making.

<sup>39</sup>Any other resolution for time periods t and geographical regions r is possible but may be limited by data availability.


**Table 4.3** Scale characterisation of the conceptual dimensions of the Multilevel Sustainable Development Index (MLSDI); n, economic object; N, number of economic objects; r, geographical region; R, number of geographical regions; t, time period; T, number of time periods; y, sustainable development key indicator; Y , number of sustainable development key indicators

The required scaling procedure regards the harmonisation of the different units of the key indicators y and is determined in the next section, Section 4.3.6.2. The temporal dimension's scale is already comparable. In the case of the economic objects n and the geographical regions r, comparability that goes beyond the scope of nominal scales (equality check) has already been reached via the standardisation procedure in Section 4.3.4.

#### **4.3.6.2 Rescaling between ten and 100**

Generally, scaling may result in common monetary units, physical units, or unitless performance scores (Prescott-Allen, 2001). In Section 4.3.4, it has been emphasised that physical units should not be transferred to monetary units (e.g. Rees & Wackernagel, 1999) and vice versa. Scaling to unitless performance scores remains to be the only option. Respective methods include, among others, ranking, growth rates, z-scores, logarithmic transformation, ratio scaling, and rescaling. With ranking, key indicators y are scaled by determining an order, growth rates represent percentage changes to a reference, and z-scores feature a sample mean of zero and standard deviation of one, logarithmic transformation applies a logarithmic function, ratio scaling divides the key indicator y by a reference value such as a target, and rescaling assigns new scores on a defined range (Field, 2009; Nardo et al., 2008; Pollesch & Dale, 2016).

In sustainable development index calculation, rankings, growth rates, z-scores, and logarithmic transformations are not suitable. Rankings would reduce the key indicators y to ordinal scales, leading to information loss. Growth rates would entail information loss of the original scores (Nardo et al., 2008), and growth rates are not able to include targets and boundaries.<sup>40</sup> Z-scores are difficult to interpret because a z-score indicates the distance to the mean measured in standard deviations. Furthermore, z-scores are defined on positive and negative value ranges, limiting the possibilities of aggregation (see Section 4.3.8). However, z-scores are required for multivariate statistical weighting techniques (see Section 4.3.7.2 and Section 4.3.7.3). Last, because logarithmic transformations are non-linear, they are harmful in index calculation and should not be applied. Non-linear transformations affect correlations (see Section 4.3.5.2; Oh & Lee, 1994), which are investigated in the weighting process (see Section 4.3.7). Nardo et al.'s (2008) statement that arithmetic aggregation of logarithmically transformed indicators is equivalent to geometric aggregation of non-transformed indicators only holds true if weights are not derived by statistical procedures. This in turn is not the ideal approach (see Section 4.3.7). Ratio scaling is a candidate for the key indicators' scaling procedure because it does not result in information loss nor in negative values. Moreover, targets and boundaries can be included. However, ratio scaling affects key indicators y differently depending on their effective direction ξ. Rescaling in combination with target setting stands out as a scaling method (Pollesch & Dale, 2016).<sup>41</sup> Targets and boundaries can be included, and mathematical discrepancies between key indicators y of different effective directions ξ are not present (Pollesch & Dale, 2016). Last, resulting scores are straightforward to interpret: The score depicts the performance of an economic object n in time period t in geographical region r relative to the minimum of the rescaling range δmin and the maximum of the rescaling range δmax. This clear interpretation benefits the assessment principle effective communication (see Table 3.1; e.g. Pint´er et al., 2018).

For the MLSDI, key indicators y are rescaled on an identical range from ten to 100. A minimum of zero is avoided because the subsequent geometric aggregation would lead to an overall index score of zero (see Equation (4.25); Saisana & Philippas, 2012). For key indicators y with a positive effective direction ξ<sup>+</sup>, a rescaled score of ten represents the minimum of a sustainable development key indicator in the sample ymin, and a rescaled score of 100 depicts the maximum of a sustainable development key indicator in the sample ymax. For key indicators y with a negative effective direction ξ−, minima ymin and maxima ymax are reverted. Moreover, a score of ten indicates a boundary, whereas a score of 100 denotes a target. If an economic object n exceeds a target, the rescaled score will be higher than 100. However, targets and boundaries have not been finalised at corporate nor at national levels yet (see Section 6.3; e.g. O'Neill et al., 2018;

<sup>40</sup>In this context, growth rates would only refer to ratio indicators <sup>y</sup>r. Growth indicators <sup>y</sup>g (see Section 4.3.4) would not require a further scaling.

<sup>41</sup>Pollesch and Dale (2016) refer to rescaling with target setting as "target normalisation". This term is not adopted because it does not indicate the underlying scaling method and could be mistaken for ratio scaling with target setting.

Whiteman et al., 2013). Therefore, the rescaling range is merely determined by internal data. For positively affecting key indicators y, a rescaled score of 100 represents the sample maximum of the respective key indicator ymax. For negatively affecting key indicators y, a rescaled score of 100 represents the sample minimum of the respective key indicator ymin. Scores that exceed 100 are not possible with internal scaling. To realise the above described rescaling, a rescaled sustainable development key indicator y<sup>s</sup> is computed by the following formula (Bravo, 2014; Krajnc & Glaviˇc, 2005; Saisana & Philippas, 2012):

$$y\_s(n,t,r) = \begin{cases} (\delta\_{max} - \delta\_{min}) \frac{y(n,t,r) - y\_{min}(r)}{y\_{max}(r) - y\_{min}(r)} + \delta\_{min}, & \text{if } \xi = \xi^+\\ (\delta\_{max} - \delta\_{min}) \frac{y\_{max}(r) - y(n,t,r)}{y\_{max}(r) - y\_{min}(r)} + \delta\_{min}, & \text{if } \xi = \xi^- \end{cases},\tag{4.13}$$

where ys[1, Ys] and Y<sup>s</sup> = Y . A rescaled key indicator y<sup>s</sup> may be a rescaled sustainable development ratio indicator yrs[1, Yrs] or a rescaled sustainable development growth indicator ygs[1, Ygs]. Because rescaling relies on the extremes (i.e. ymax and ymin), it is highly sensitive to outliers (Nardo et al., 2008). Hence, outliers have been detected and treated in the previous calculation step (see Section 4.3.5.2).

The rescaled scores are interpreted as follows (Prescott-Allen, 2001):


The set of rescaled sustainable development indicators c<sup>4</sup><sup>s</sup> is formally described by:

$$c\_{4s} = c\_{4s}(n, y\_s, t, r). \tag{4.14}$$

The rescaled key indicators y<sup>s</sup> are weighted in the following section, Section 4.3.7.

#### **4.3.7 Weighting**

Weighting in index calculation refers to the process of assigning coefficients to the index's underlying variables in order to increase or decrease a variable's importance on the composite measure (Greco et al., 2019; Nardo et al., 2008). In sustainable development index calculation, weighting leads to compliance of the principles synergies and trade-offs as well as relevance: Weighting integrates themes, addresses relationships, determines interconnection of goals, and assesses their unequal contributions to sustainable development (see Table 3.1; e.g. Costanza, Fioramonti & Kubiszewski, 2016; Janouˇskov´a et al., 2018). Eventually, weighting closes the knowledge gap (see

Section 2.3.3; e.g. Weitz et al., 2018). Moreover, weighting ideally captures the relative benefit or harmfulness to society (T. Hahn & Figge, 2011) and should be objective (see Table 3.1; Sala et al., 2015).

In the MLSDI's weighting procedure, all contentual domains are addressed simultaneously in a multivariate setting because sustainable development is one integrated crisis and not three separate crises (see Section 2.1; WSSD, 2002). However, to account for the unbalanced number of key indicators y within the three contentual domains (see Section 5.3.1), the initially estimated coefficients are adjusted to sum up to one in each domain. An adjusted coefficient is a weight, denoting a key indicator's importance within a domain. An importance factor is a modified weight that signals a key indicator's influence on the overall index (Becker et al., 2017). The modification is accomplished by the rule of three: Weights are related to the number of indicators within a domain, and importance factors are related to the total number of indicators included in the MLSDI. Consistent with outlier detection and scales, weights are time invariant but may vary over geographical regions r (see Section 4.3.5 and Section 4.3.6; Nardo et al., 2008; Nilsson et al., 2016).

In the following section, Section 4.3.7.1, an overview on weighting methods is given to determine the MLSDI's approach. The applied methods are introduced in Section 4.3.7.2 to Section 4.3.7.4. Statistical tests are performed in Section 4.3.7.5.

#### **4.3.7.1 Overview of weighting methods**

Weighting methods in sustainable development index construction are controversially discussed because a range of possible pathways to sustainability exists (see Section 2.2.4; Leach et al., 2013). These possibilities are coupled with uncertainties in, for example, the environmental domain (see Section 2.2.1; Steffen et al., 2015). Weights of the environmental domain can only be determined properly if the natural scientific relationship is known (see Section 6.3; Ebert & Welsch, 2004). Established targets and boundaries are irrelevant for the weighting method because they are limits expressed in the scales (see Section 4.3.6.2). The possible pathways and uncertainties lead to three different approaches on weighting: expert surveys, equal weighting, and statistical weighting. Expert surveys and inclusion of subjective opinion can be advantageous because, for example, experts are a key source of information in corporate decision making (Escrig-Olmedo, Mu˜noz-Torres, Fern´andez-Izquierdo & Rivera-Lirio, 2017). However, subjective methods are severely criticised because subjectivity leads to volatile results, disagreements, and a lack of science (Giannetti, Bonilla, Silva & Villas Bˆoas de Almeida, 2009; Rogge, 2012). Mixed methods such as multicriteria decision-making methods (e.g. Boggia & Cortina, 2010; Triantaphyllou, 2000) reduce the amount of subjectivity by providing "objective mathematics to process subjective and personal preferences" (Saaty, 2001). One example of such a method is the analytical hierarchy process.

Weights are determined by decomposing the problem into a system of hierarchies and comparing the decomposed elements in a pairwise manner (Saaty, 1980; Triantaphyllou, 2000). Despite being diminished, subjectivity remains a critical issue (Zhou et al., 2006) as decision makers might be tempted to take advantage of their mediating power (see Section 3.2; Jesinghaus, 2018). Second, e.g. Schmidt-Traub et al. (2017b) argue equal weighting should be applied because a consensus on weights in expert surveys could not be established, and equal weights would reflect a policy maker's commitment of equal goal priority. Further arguments for equal weighting include simplicity of construction, a lack of theoretical structure to justify other weighting schemes, and inadequate statistical knowledge (Decancq & Lugo, 2013; Greco et al., 2019; Nardo et al., 2008). Top-down equal weighting is an enhanced version of equal weighting because variables are first equally weighted into categories, then categories are equally weighted into domains, and last, domains are equally weighted into an overall index (e.g. Schmidt-Traub et al., 2017b; Zuo et al., 2017). Nilsson et al. (2016) warn to ignore overlaps of targets and goals: Double counting would occur, resulting in an implicit higher weighting of equally weighted correlated variables (Greco et al., 2019; Nardo et al., 2008). Rogge (2012) also concludes that the simplicity of equal weighting "is often thoroughly misleading". In conclusion, equal weighting is "convenient but [...] universally considered to be wrong" (Chowdhury & Squire, 2006; Decancq & Lugo, 2013; Greco et al., 2019). To tackle synergies and trade-offs as well as relevance, statistical methods must be applied until the natural scientific relationships are known (see above; Ebert & Welsch, 2004) because statistical weighting is least biased and least subjective (Greco et al., 2019; Mayer, 2008; Zhou, Ang & Poh, 2007).

Statistical weighting in index calculation essentially regards data reduction (Mayer, 2008). The sustainable development elements (i.e. the rescaled key indicators ys) are cleaned with respect to correlations and mutually included information. Multivariate statistical techniques for dimensionality reduction include a variety of methods, and an overview can be found in, e.g. Meng et al. (2016). In the field of sustainable development assessment, data envelopment analysis, factor analysis, and PCA are conducted (e.g. Bolc´arov´a & Koloˇsta, 2015; Shaker, 2018; Tseng et al., 2018; B. Zhang et al., 2008; Zhou et al., 2007). Data envelopment analysis is not suitable as a weighting method for a sustainable development index because it is a technique for measuring efficiencies of decision-making objects, not being concerned with data reduction (Charnes, Cooper & Rhodes, 1978; Ramanathan, 2003; Rogge, 2012). Moreover, efficiencies are obtained by dividing weighted sums of data outputs by weighted sums of data inputs. Weights in turn are determined by an optimisation function defined by the modeller (Greco et al., 2019). This procedure entails three issues. First, weights maximise the composite indicator (Ramanathan, 2003), while sustainable development index construction is not an optimisation problem. Instead, the index is designed to quantify unsupervised sustainable development performances (see Chapter 3 and Chapter 4; e.g. Bell &

Morse, 2008). Data envelopment analysis overemphasises well-performing elements, such that economic objects n may appear as brilliant performers, while they are not (Rogge, 2012). Second, the target function involves a modeller's subjectivity, and third, aggregation by weighted sums does not minimise substitutability as required along with weak sustainability (see Section 2.2.4 and Section 4.3.8). Factor analysis and PCA are dimensionality reduction techniques and generally suitable for weighting. They are closely related to each other but differ in the direction of analysis. Factor analysis is a top-down approach that aims to describe a number of latent factors with a smaller number of observed variables. A model is fitted, and the solution to it is non-unique (Haerdle & Simar, 2012). PCA functions vice versa: PCA is a bottom-up method that reduces observed variables into a smaller number of latent components. Because sustainable development index calculation is an unsupervised modelling task, it is a bottom-up problem setting in which the latent index is driven by the behaviour of the observed variables (Mayer, 2008). Consequently, PCA instead of factor analysis is suitable for weighting. Furthermore, PCA yields one unique solution, such that subjective interpretations are absent (Haerdle & Simar, 2012). However, factor analysis is a useful tool in problem settings such as studied by Tseng et al. (2018). An explanatory factor analysis is applied to derive latent constructs by underlying, observed attributes of corporate sustainability such as stakeholder management and corporate culture.

The next section, Section 4.3.7.2, describes the PCA as the first method to derive a weight of a sustainable development key indicator ω and an importance factor of a sustainable development key indicator ψ. Two further methods follow in Section 4.3.7.3 and Section 4.3.7.4.

#### **4.3.7.2 Multivariate statistical analysis: Principal Component Analysis (PCA)**

PCA (Pearson, 1901) is a linear, static technique to reduce a data set's dimensionality by only incorporating data that are responsible for a certain variation (Haerdle & Simar, 2012; G. James et al., 2013; Jolliffe, 2002). This technique can be used for determining key indicators' weights ω because rescaled key indicators y<sup>s</sup> that are responsible for more variation in the data set contain more information and should thus receive a higher weight. Because PCA focuses on variances (G. James et al., 2013; Jolliffe, 2002), data must be free of outliers and z-score scaled (see Section 4.3.5.2 and Section 4.3.6.2; Field, 2009). Otherwise, weights of high variance variables would be overestimated (G. James et al., 2013). PCA does not impose a distributional assumption (Jolliffe, 2002), but as linear correlations are investigated, it is assumed that variables are linearly related.

To achieve the dimensionality reduction, data are transformed to a number of latent, uncorrelated Principal Components (PCs), which are sorted in a descending order according to their variation along with the original data set (G. James et al., 2013;

Jolliffe, 2002). A system of linear equations is set up and solved subject to several constraints. The linear equations contain original variables and associated coefficients, also referred to as loadings. The first PC is found by maximising the PC's variance subject to the loadings having a unit length of one. This is obtained by equalising the sum of squared elements of the vector of loadings to one. The second PC is derived by maximising the variance and appending the constraint of being orthogonal to the first PC; the product of the first and the second PCs' loadings is equalised to zero. The following PCs are found in a similar fashion. After solving the system for each equation, each PC's loading and eigenvalue are specified (Jolliffe, 2002). Definitions of eigenvalues are typically complex, mathematical definitions (Field, 2009) and can be found in, e.g. Haerdle and Simar (2012). In PCA, eigenvalues refer to the variance-covariance matrix and reveal the evenness of distribution of variances throughout the data set (Field, 2009). Loadings are stored in a matrix with variables in the rows and PCs in the columns (Jolliffe, 2002). Squaring each element of this matrix yields the substantive importance of a variable to a PC (Field, 2009). To receive the weights, this matrix is multiplied with a vector of variances of the PCs. However, not all PCs are included, but only a few are chosen that adequately account for a certain variation in the data set. Rules for inclusion involve thresholds on eigenvalues and the explained cumulative variance. These thresholds are critically discussed in the literature. Kaiser (1960) suggests including PCs with eigenvalues larger than one as these explain at least one variable. Jolliffe (2002) argues that Kaiser's (1960) criterion is too strict and recommends a threshold of 0.7. There is evidence that Kaiser's (1960) criterion is accurate if the chosen PCs explain a cumulative variance greater or equal than 70% with a sample size smaller than 30 or 60% with a sample size greater than 250 (Field, 2009).

For the MLSDI, the sample size equals 62 (see Section 5.1), and thus, PCs with eigenvalues larger than one or to reach a cumulative variance of 70% are included. The PCA is performed (CRAN, 2019) for each time period t, and a weight of a sustainable development key indicator derived by the PCA ωPCA is obtained by applying the arithmetic mean over the time periods t:

$$
\omega^{PCA}(y\_s, r) = \frac{1}{T} \sum\_{t=1}^{T} \omega\_t^{PCA}(y\_s, t, r), \tag{4.15}
$$

where ωPCA <sup>t</sup> represents a weight of a sustainable development key indicator derived by the PCA in a time period t. The corresponding importance factor of a sustainable development key indicator derived by the PCA ψPCA is formally represented by:

$$
\psi^{PCA} = \psi^{PCA}(y\_s, r). \tag{4.16}
$$

A PC is the weighted sum of the loadings and z-score scaled key indicators yz, where

yz[1, Yz] and Y<sup>z</sup> = Y . It corresponds to a sustainable development key component p, and its set – the set of sustainable development key components c<sup>3</sup> – is formally represented by:

$$c\_3 = c\_3(n, p, t, r),\tag{4.17}$$

where p[1, P]. However, as the weighted sum is not deployed for aggregation (see Section 4.3.8), key components p and their set c<sup>3</sup> are obsolete.

Disadvantages of the PCA are incorrect assessment of the temporal dimension and limitation to linearity (see above). In the following section, Section 4.3.7.3, the PCA is extended to the Partial Triadic Analysis (PTA) to overcome the first shortcoming of the incorrect temporal assessment.

#### **4.3.7.3 Multivariate statistical analysis: Partial Triadic Analysis (PTA)**

The PTA expands the PCA by incorporating time. Three-dimensional panel data are interpreted as a sequence of two-dimensional tables.<sup>42</sup> In doing so, a multivariate time series structure is captured in three steps. The first step is called interstructure and aims to derive the importance of each time period. A matrix of scalar products between two-dimensional tables is computed to derive temporal weights. In a second step, the weighted sum of the original time series of tables is computed, yielding the so-called compromise matrix. This matrix captures the common structure of the twodimensional tables. As a last step, rows and columns of all original tables of the time series are projected onto a PCA of the compromise. Thus, this step is called trajectory. The trajectories summarise the variability of the time series around the compromise (Gallego-Alvarez, Galindo-Villard´on & Rodr´ıguez-Rosa, 2015; Thioulouse et al., 2004). ´

The application utilised in the MLSDI is based on Dray, Dufour and Thioulouse (2018). A weight of a time period derived by the PTA ΩPTA is formally denoted by:

$$
\Omega^{PTA} = \Omega^{PTA}(t, r). \tag{4.18}
$$

A weight of a sustainable development key indicator derived by the PTA ωPTA is determined similarly to the PCA (see Section 4.3.7.2), but the temporal dimension is implicitly accounted for (see above), such that the arithmetic mean is not required:

$$
\omega^{PTA} = \omega^{PTA}(y\_\*, r). \tag{4.19}
$$

The corresponding importance factor of a sustainable development key indicator

<sup>42</sup>Several authors controversially discuss the originality and mathematical details of this approach (e.g. Kroonenberg, 1983; Thioulouse, Simier & Chessel, 2004). According to the research of this work, first versions of temporal extensions date back to Tucker (1964), who extended factor analysis to three-dimensional matrices. Levin (1965); and Tucker (1966) followed this approach and referred to it as "three-mode factor analysis". Kroonenberg (1983) applied the idea to PCA and named it "Partial Triadic Analysis (PTA)". Thioulouse and Chessel (1987) first applied it to ecology.

derived by the PTA ψPTA is represented by:

$$
\psi^{PTA} = \psi^{PTA}(y\_s, r). \tag{4.20}
$$

Discussion on the number of PCs to retain could not be identified in the literature. Transferring Kaiser's (1960) criterion to the PTA and its implicit inclusion of time, PCs with eigenvalues exceeding the number of time periods T are retained. Given the cumulative variance's relative character, its threshold value remains at 70%. Key components p would be determined analogously to their derivation in the PCA but are also redundant (see Section 4.3.7.2).

The following section, Section 4.3.7.4, deals with the Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm. It is an information-theoretic application that overcomes the shortcoming of the PCA and the PTA of being limited to linearity (see Section 4.3.7.2). Hereafter, the term "PC family" is used when referring to both PCA and PTA. Their weights and importance factors are summarised in the symbols ωP C and ψP C, respectively.

#### **4.3.7.4 Information theory: Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm**

Information theory has its origins in communication theory (Shannon, 1948) but relates to many disciplines nowadays. Of interest for this work are its relations to statistics and computer science (Cover & Thomas, 1991). How can key indicators' weights ω be derived by statistical approaches of information theory, and what are efficient algorithms in application? Motivation for information-theoretic applications are non-linearity as well as its known efficiency and effectiveness (P. E. Meyer, 2008; P. E. Meyer, Lafitte & Bontempi, 2008; Peng, Long & Ding, 2005; Yu & Liu, 2004). Similar to the PC family, information theory is a bottom-up approach, in which the underlying variables drive the index's behaviour (see Section 4.3.7.1; Mayer, 2008).

Information-theoretic index construction may be based on the Fisher information or entropy. Fisher information measures the amount of information that a variable contains about a parameter and is defined in the context of a family of parametric distributions. Similar to the Fisher information is entropy, which also measures the amount of information a variable contains. It is a function of an underlying process's probability distribution and "is a measure of the average uncertainty in the random variable". In contrast to the Fisher information, entropy is non-parametric and defined for all distributions (Cover & Thomas, 1991). Because an index is based on a variety of variables that originate in diverse distributions, entropy is the preferred measure. Mutual information is closely related to entropy and is the reduction of uncertainty in a random variable due to another random variable. It measures the dependency between two random variables but can be extended to be multivariate (Cover & Thomas, 1991).

Moreover, it is also referred to as total correlation and is a natural measure of relevance (Jakulin & Bratko, 2004; P. E. Meyer, 2008; Watanabe, 1960). A variable is relevant if it reduces uncertainty (Kojadinovic, 2005) and if its removal alters the overall or a subset's conditional probability distribution (Kohavi & John, 1997; P. E. Meyer, 2008). In contrast, a variable is redundant if and only if it is not relevant (Yu & Liu, 2004). To yield inference about the variables' relationships, multivariate data are understood as a network, and three steps are carried out. First, data are discretised, second, a matrix containing mutual information is calculated, and third, an inference algorithm is performed. Discretisation is the partitioning of an interval into subintervals. It suffers from information loss because differentiation between values of one interval is not possible (Sch¨afer & Strimmer, 2005; Yang & Webb, 2009). Nonetheless, estimators are constructed for discrete variables (P. E. Meyer, 2008) because simulation studies provide evidence that discretisation yields better results than basing the analysis on distributional assumptions (Dougherty, Kohavi & Sahami, 1995; Yang & Webb, 2009).

Several algorithms were developed to assess gene networks in the field of bioinformatics (e.g. P. E. Meyer et al., 2008). These types of algorithms are of interest in sustainable development index calculation because the individual sustainable development elements also represent a network of mutually correlated nodes that go beyond linear correlations. In this work, the MRMRB algorithm is deployed (P. E. Meyer et al., 2008, 2019) because experiments deliver evidence of superior performance relative to several other algorithms (Bourdakou, Athanasiadis & Spyrou, 2016; P. E. Meyer, Marbach, Roy & Kellis, 2010). The MRMRB algorithm first determines the difference of mutual information between two random variables (i.e. relevance) and the average mutual information along the selected variables (i.e. redundancy). Subsequently, the algorithm ranks these differences, with direct interactions being ranked before indirect interactions. As a third step, backward elimination is performed: Variables with the lowest mutual information are first eliminated from the network (P. E. Meyer et al., 2010). With the MRMRB algorithm, four estimators can be implemented: empirical estimator, Miller-Madow corrected estimator, Shrink entropy estimator, and Schurmann-Grassberger estimator (P. E. Meyer et al., 2008). In calculating the MLSDI, the Miller-Madow corrected estimator is chosen as it corrects the asymptotic bias of the empirical estimator. The Shrink entropy estimator is less general and only suitable for small sample sizes (P. E. Meyer et al., 2008; Sch¨afer & Strimmer, 2005). The Schurmann-Grassberger estimator is parametric and makes distributional assumptions (P. E. Meyer et al., 2008). Key indicators y are discretised by equal frequency discretisation. In this discretisation method, the partitioned interval may be of different sizes, but the frequency of occurrence within an interval is identical in each interval (P. E. Meyer, 2008; Yang & Webb, 2009). Especially when combined with the Miller-Madow corrected estimator, this discretisation method is more efficient than methods such as equal width (P. E. Meyer, 2008; Yang & Webb, 2003). The number of intervals controls the variance-bias trade-off in estimation: Too

many intervals result in too few data points and an increased variance, whereas too few intervals lead to information loss and an increased bias (see above; Cover & Thomas, 1991; P. E. Meyer, 2008; Yang & Webb, 2009). Recommendation by P. E. Meyer et al. (2008); and Yang and Webb (2003) on the bin size of the interval is followed: The bin size of equal frequency discretisation χs, which depicts the number of economic objects N in one bin, is set equal to the square root of the sample size:

$$
\chi\_s = \sqrt{N}.\tag{4.21}
$$

Given the square root, the number of bins of equal frequency discretisation χ<sup>n</sup> is equivalent to the bin size χs:

$$\chi\_n = \frac{N}{\chi\_s} = \frac{N}{\sqrt{N}} = \sqrt{N} = \chi\_s. \tag{4.22}$$

A weight of a sustainable development key indicator derived by the MRMRB algorithm ωMRMRB is formally denoted as follows:

$$
\omega^{MRMRB} = \omega^{MRMRB}(y\_s, r). \tag{4.23}
$$

The corresponding importance factor of a sustainable development key indicator derived by the MRMRB ψMRMRB is formally described by:

$$
\psi^{MRMRB} = \psi^{MRMRB}(y\_s, r). \tag{4.24}
$$

Because the MRMRB algorithm is capable of detecting higher order correlations, it is expected to yield superior results compared to the PC family.

The next section, Section 4.3.7.5, deals with statistical tests of the PC family (see Section 4.3.7.2 and Section 4.3.7.3). The MRMRB algorithm does not require statistical tests because it does not make distributional assumptions (see above), and the total correlation is simply zero in the absence of correlations (Cover & Thomas, 1991).

#### **4.3.7.5 Statistical tests of model assumptions**

The PC family is tested with the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy (Kaiser, 1970) and Bartlett's test of sphericity (Bartlett, 1950, 1951).<sup>43</sup> The KMO measure is the ratio of squared correlations between variables to the squared partial correlation between variables. It indicates the degree of diffusion in the pattern of correlations: A value close to zero indicates a relatively small numerator and diffusion in the pattern of correlations, whereas a value close to one indicates a relatively large

<sup>43</sup>These tests were initially developed for factor analysis but can also be applied to PCA (Field, 2009; Jolliffe, 2002).

numerator and a compact pattern of correlations. In the latter case, the sample is adequate for performing the PC family (Field, 2009; Kaiser, 1970). Values of the KMO measure and resulting factorial simplicity are interpreted as follows (Kaiser, 1974):


To evaluate whether the KMO measure should be based on Pearson's coefficient or Kendall's tau (see Section 4.3.3.3; Field, 2009), normality of the z-score scaled key indicators y<sup>z</sup> is tested. Similar to the key figures x, the univariate Shapiro-Wilk and Kolmogorov-Smirnov tests are performed (see Section 4.3.3.4; e.g. CRAN, 2019). For consistency to the PC family's calculation procedure, tests are performed for each year and averaged subsequently.

Bartlett's test of sphericity examines whether there are PCs to determine. Under the null hypothesis, the correlation matrix is proportional to the identity matrix: Group variances are the same or similar to each other, and covariances are equal or close to zero. In this case, variables are not correlated, and PCs do not exist. The null hypothesis is desired to be rejected with p-values smaller than 0.05 (Field, 2009). The same correlation coefficient (Pearson vs. Kendall) as for the KMO test is chosen.

Multicollinearity is not an issue for the PC family (Field, 2009) and thus not tested.

#### **4.3.8 Aggregation**

Aggregation theory is an area of mathematics that investigates aggregation functions (Pollesch & Dale, 2015). An index or composite measure is an aggregate, which is a single value that represents "an arbitrary long set of related values" (Pollesch & Dale, 2015). An aggregation function performs the mathematical operation of mapping diverse variables into one aggregate (Grabisch et al., 2009; Pollesch & Dale, 2015). This mathematical operation is called aggregation. Aggregation is considered as the major step in index construction (Zhou et al., 2010) because it moderates the degree of substitutability (Grabisch et al., 2009). To map weak sustainability with minimised substitutability (see Section 2.2.4), a compensatory aggregation function ought to be applied because high input components may be offset by low input components and vice versa. In contrast, setoffs are not possible in non-compensatory aggregation functions (Pollesch & Dale, 2015). These hence map strong sustainability. For methodologically sound aggregation in terms of credibility, validity, and reliability (see Table 3.1; Cash et


**Table 4.4** Aggregation rules (B¨ohringer & Jochem, 2007; Ebert & Welsch, 2004; Pollesch & Dale, 2015)

al., 2003; Janouˇskov´a et al., 2018), aggregation rules must be obeyed. Ebert and Welsch (2004) show that meaningful aggregation of diverse variables into an aggregate depends on the variables' scales. Their aggregation rules regard the type of scale (interval vs. ratio) as well as non-comparability and comparability of scales. Non-comparable or independent scales are present when all input and output variables are measured on the same scale but do not share the same unit. Comparable or single scales are present when input and output variables share the exact same scale and unit of measurement. In this context, input and output variables refer to the index: Inputs are the unscaled key indicators y and outputs are the resulting composite measures. The aggregation rules' matrix is shown in Table 4.4. Dictatorial ordering is an aggregation function in which one input variable is responsible for the output and is thus non-compensatory (Ebert & Welsch, 2004; Pollesch & Dale, 2015). The geometric mean is equivalent to the weighted product with equal weights (Zhou et al., 2006) and is hence a special case of the weighted product. The same applies to the arithmetic mean and weighted sum. The aggregation rules by Ebert and Welsch (2004) can therefore be extended to the weighted product and weighted sum (see Table 4.4). Geometric aggregation (geometric mean or weighted product) and arithmetic aggregation (arithmetic mean or weighted sum) are both compensatory aggregation functions (Pollesch & Dale, 2015).

As probably most other sustainable development indices, the MLSDI comprises ratio-scaled, non-comparable key indicators y (see Table 4.3). Therefore, only geometric aggregation is meaningful. Moreover, geometric aggregation implicates two advantages. First, it maps weak sustainability with minimised substitutability because it is a compensatory aggregation function that penalises poor performances and rewards good performances (Yoon & Hwang, 1995; Zhou et al., 2006). Balanced performances yield better aggregated scores than unbalanced performances. The lower an indicator's score, the lower the rate of compensation is. If only one indicator equals zero, the composite measure vanishes. To avoid this non-compensatory case, the geometric aggregation is combined with rescaled key indicators y<sup>s</sup> between ten — instead of zero — and 100 (see Section 4.3.6.2; Saisana & Philippas, 2012). Second, the weighted product performs best in respect of information loss: The system of information before aggregation is closest to the system of information after aggregation (Zelen´y, 1982; Zhou et al., 2006).

The weighted product (Pollesch & Dale, 2015) is applied to aggregate the rescaled

key indicators y<sup>s</sup> of a contentual domain, accounting for synergies and trade-offs (see Table 3.1; e.g. Costanza, Fioramonti & Kubiszewski, 2016) and yielding a subindex of a contentual domain d:

$$d(n,t,r) = \prod\_{y\_s=1}^{Y\_s} y\_s(n,t,r)^{\omega(y\_s,r)},\tag{4.25}$$

where d[1, D]. The set of sustainable development subindices c<sup>2</sup> then reads:

$$c\_2 = c\_2(n, d, t, r). \tag{4.26}$$

To yield the overall MLSDI c1, the geometric mean is deployed on the subindices d:

$$c\_1(n,t,r) = \prod\_{d=1}^D d(n,t,r)^{\frac{1}{D}}.\tag{4.27}$$

Statistical weighting of the contentual domains is not feasible because methods approximately reflect the contentual domains' number of key indicators Y . Scores of the four composite measures – the subindices of each contentual domains d and the overall MLSDI c<sup>1</sup> – are interpreted in the same fashion as the rescaled key indicators' scores (see Section 4.3.6.2).

In the final step of the MLSDI, sensitivities are investigated. The following section, Section 4.3.9, outlines the methodology of this investigation.

#### **4.3.9 Sensitivity analyses**

Sensitivity analysis is the study of appointing individual sources of uncertainty in the model input to variances of the model output (Saisana, Saltelli & Tarantola, 2005; Saltelli et al., 2008; Saltelli, Tarantola, Campolongo & Ratto, 2004). In index construction, sensitivities of each calculation step should be analysed to ensure methodological soundness in terms of credibility, validity, and reliability as well as robustness and transparency (see Table 3.1; Cash et al., 2003; Janouˇskov´a et al., 2018; Pint´er et al., 2018; Saisana et al., 2005; Sala et al., 2015).

Sophisticated methods for sensitivity analyses include, for instance, elementary effects methods, variance-based methods, factor mapping, and meta-modelling (Saltelli et al., 2008). However, for the MLSDI, profound theoretical and methodological research has been carried out (see Chapter 2 to Section 4.3.8), such that a simple OAT sampling for non-unique calculation steps is sufficient. In an OAT sampling, one parameter is varied at a time (Saltelli et al., 2008). Non-unique calculation steps that involve alternatives are missing value imputation (see Section 4.3.3), outlier detection (see Section 4.3.5), and weighting (see Section 4.3.7). For missing value imputation and weighting, sensitivities of the different presented methods are investigated. Regarding outlier detection, the outlier coefficient α is varied, and three cases are investigated: the outlier coefficient α equals 1.5, 3.0, and infinity. The first case is the base case (see Section 4.3.5.2; e.g. Aggarwal, 2017) and depicts the inner fence, the second case is laxer and constitutes the outer fence (Tukey, 1977), and the last case corresponds to a non-treatment case (see Section 4.3.9). The latter is of importance as distortion of the true picture is a general concern in outlier treatment (see Section 4.3.5; McGregor & Pouw, 2017). Sensitivities are examined by economic objects' average rank shift in the four composite measures and changes in their performance scores (Greco et al., 2019).

#### **4.4 Summary and interim conclusion**

Thus far, a conceptual framework of sustainable development has been derived, and in doing so, the first four related research gaps – the perspective, operational-to-normative, knowledge, and the sustainability gaps – have been identified and partially addressed. By including the multilevel perspective and the St. Gallen management model in the conceptual framework, the perspective and the operational-to-normative gaps are theoretically closed. Comprehensive and comparable measurement of sustainable development performances by multilevel objects are inevitable for the sustainability transition because sustainable development is a society-level concept and can only be achieved if micro and meso objects contribute. Sustainable development assessment principles that account for the first four related gaps assist to determine the most useful analytical tool for a comprehensive and comparable measurement. Indicator sets that include a composite measure stand out as such tools. Indicators are able to map all six dimensions of the conceptual framework, including the aggregational size for multilevel measurement, and are capable of obeying the conceptual as well as the assessment principles. They continue closing the perspective and the operational-to-normative gaps. Sustainable development indices address the knowledge gap by exploring synergies and trade-offs of individual sustainable development elements. However, multilevel sustainable development indices could not be identified in the academic literature, and previous single level indices lack compliance of the assessment principles and exhibit methodological shortcomings. The lack of methodological soundness constitutes the fifth and last research gap. Hence, the MLSDI's main contributions are multilevel applicability and methodological strength.

To quantify meso-level corporate contributions to the macro concept sustainable development, the MLSDI is derived in nine well-researched steps: collection of key figures, preparation of key figures, imputation of missing values, standardisation to key indicators, outlier detection and treatment, scaling, weighting, aggregation, and sensitivity analyses. The data collection of key figures relies on official, open source statistics to address the sustainability gap and ensure the assessment principle transparency. Two methods for missing value imputation are tested: single time series imputation and multiple panel data imputation. The key indicators are determined by aligning the meso GRI and

the macro SDG frameworks. Multilevel comparability is established by standardisation to the GVA and further metrics. Macro-level GVA instead of, for example, meso-level profits is chosen because comparable measurement of meso contributions to the macro SDGs is aimed at. Outliers are detected and treated by the IQR method, and key indicators are rescaled between ten and 100. Three weighting methods are examined: the PCA, PTA, and the MRMRB algorithm. The latter is theoretically superior and thus expected to yield more accurate results. Geometric aggregation is implemented to project weak sustainability with minimised substitutability. Sensitivities of the four composite measures – the three subindices and the overall MLSDI – are tested for missing value imputation, outlier detection, and weighting. In conclusion, the MLSDI overcomes previous indices' methodological shortcomings in several aspects:


A summary of the methodological approaches and assessment principle compliance by each calculation step of previous sustainable development indices and the MLSDI is displayed in Table 4.5.

In the following chapter, Chapter 5, the MLSDI is applied to a sample region. The application crafts reliable empirical knowledge about sustainable development performances of this region and empirically tackles the knowledge gap. By broadly disclosing the calculation results, the sustainability gap is further approached.


#### 4.4. Summary and interim conclusion 109


 Summary of methodological approaches and assessment principle compliance of previous sustainable development indices and the MLSDI; Y, yes; P, partially; N, no; U, unknown; †, see Section 5.3.1; DJSI, Dow Jones Sustainability Indices; FEEM SI, FEEM Sustainability Index; GDP, Gross Domestic Product; GRI, Global Reporting Initiative; GVA, Gross Value Added; HSDI, Human Sustainable Development Index; ICSD, Composite Sustainable Development Index; IGO, Intergovernmental Organisation; IQR, Interquartile Range; MISD, Mega Index of Sustainable Development; MRMRB, Maximum Relevance Minimum Redundancy Backward algorithm; p.c., per capita; PCA, Principal Component Analysis; PTA, Partial Triadic Analysis; SDG, Sustainable Development Goal; SDGI, Sustainable Development Goal Index; SDI, Sustainable Development Index; SSI, Sustainable Society Index; WI, Wellbeing Index

Open Access This chapter is licensed under the terms of the Creative Commons At tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Chapter 5**

### **Empirical findings**

In this chapter, the previously developed methodology of the MLSDI (see Chapter 4) is computed for a sample region, and the empirical findings are presented and discussed. Thereby, the knowledge (see Section 2.3.3; e.g. Weitz et al., 2018) and the sustainability gap (see Section 2.3.4; e.g. Hall et al., 2017) are tackled.

This chapter is structured as follows. First, the sample (except the key figures x and the key indicators y) is introduced in Section 5.1. Hereafter, results of the sustainable development key figures are exhibited in Section 5.2: Section 5.2.1 presents results of the data collection and preparation process, and Section 5.2.2 fills the incomplete sample's data gaps. Section 5.3 deals with the multilevel key indicators y. First, they are derived from the meso GRI and the macro SDG frameworks in Section 5.3.1, and second, their empirical findings are analysed. To this end, summary statistics of the unscaled growth indicators y<sup>g</sup> are investigated in Section 5.3.2, whereas an analysis of the unscaled ratio indicators y<sup>r</sup> is refrained from, given their non-comparability. The key indicators' outlier detection and treatment are outlined in Section 5.3.3, and the empirical findings of the cleaned and rescaled key indicators y<sup>s</sup> are examined in Section 5.3.4. The main contribution to the knowledge gap's missing understanding of the dynamic interactions of the individual sustainable development elements makes Section 5.4. A comparative analysis of weights ω and importance factors ψ by the three applied weighting methods – PCA, PTA, and MRMRB algorithm – is carried out in Section 5.4.3. Section 5.4.1 and Section 5.4.4 deal with the PC family's statistics, and Section 5.4.2 outlines the MRMRB algorithm's diagnostics. Section 5.5 analyses the four composite sustainable development measures' summary statistics (see Section 5.5.1) and results for the selected branches (see Section 5.5.2). Last, sensitivities of the applied methods are tested in Section 5.6.

### **5.1 Data base, objects of investigation, and time periods**

Because the MLSDI's calculation mechanisms are driven by macro-economic objects n (see Section 4.3.1), macro-economic data from official statistics comprise the data base. These statistics deliver best benchmarks (Carraro et al., 2013) for methodological soundness (see Section 3.1; e.g. Cash et al., 2003) and are open access. Therefore, they are easily acquired (Zuo et al., 2017), and transparency is provided (see Section 3.1; e.g. Pint´er et al., 2018). As it has been anticipated in Section 4.3.2, the sample's geographical region r is Germany, and thus, data are collected from the following three official institutions: Destatis, Eurostat, and the Federal Employment Agency (BA). Destatis and Eurostat mainly cover key figures x of the environmental and the economic domain, whereas social key figures x are primarily acquired from the BA. More information on the collected key figures x will follow in Section 5.2.1. The time horizon reaches from 2008 (t = 1) to 2016 (T = 9). Data before 2008 are not comparable as they are released in a predecessor classification of the currently valid NACE Rev. 2 standard (Eurostat, 2008b). 2016 is the most recent year of major statistics by economic objects n at the time of research (e.g. Destatis, 2018h). The macro-economic objects n are industries or branches in NACE (see Section 4.3.2.2) that are organised in an inclusive hierarchy (see Section 4.3.1; Gibson et al., 2000). NACE's granularity varies according to four levels: classes, groups, divisions, and sections. 385 classes nest in 177 groups, 177 groups add up to 64 divisions, and 64 divisions condense into 20 sections (Eurostat, 2008b). Owing to their identifying NACE code, economic objects n at these levels are also said to be classified at one-digit, two-digit, three-digit, or four-digit level, respectively. For the MLSDI, computation at all levels is desired to support the collective responsibility for sustainable development (see Section 2.1; WSSD, 2002). As many stakeholders as possible should be informed, and a broad audience should be attracted with effective communication (see Section 3.1; e.g. Pint´er et al., 2018). However, data for groups and classes are rarely available (i.e. unit non-response occurs), and the MLSDI's determinative economic objects n are divisions at two-digit level. The 64 divisions as well as their superordinate sections are listed in Table A.1 in the Appendix A.1. The last two divisions – 97-98 Activities of households as employers; undifferentiated goods- and services-producing activities of households for own use and 99 Activities of extraterritorial organisations and bodies – are omitted due to their frequent zero output (e.g. Destatis, 2018h). Therefore, the sample's number of economic objects N equals 62. In parts of the analysis, not all but selected economic objects n are focused. These selected branches involve the health economy, agricultural sector, manufacturing sector, chemical industry, car industry, service sector, Information Technology (IT) industry, financial industry, real estate industry, and the overall German economy. Sectors correspond to sections at one-digit level, and industries are divisions


**Table 5.1** Selected branches of the sample (Eurostat, 2008b); IT, Information Technology; n/a, not applicable

at two-digit level. These abbreviated denotations and the associated NACE codes (except for the health economy; see below) are enumerated in Table 5.1.

The health economy is a cross-sectional industry, and its definition is based on product delimitation performed by the economic research institute WifOR and the Federal Ministry for Economic Affairs and Energy (BMWi) (Gerlach et al., 2018). For consistency to the MLSDI's determinative economic objects n, the health economy is defined at NACE two-digit level in this work. The health economy's stakes in two-digit divisions are attached to the Appendix A.2, Table A.2. The health economy is of interest because it contributes most to the German GDP and labour market among the divisions, with GVA and working population shares of 12.1% and 17.0% in 2018, respectively (BMWi, 2019). Furthermore, corporate responsibility<sup>44</sup> reporting in the worldwide health economy features a considerably increasing trend: Its reporting rate grew from 68% in 2015 to 76% in 2017 (KPMG, 2017).<sup>45</sup> The overall German economy and aggregated sectors (agricultural, manufacturing, and service sectors) are selected to attract a broad audience (see above). The chemical industry is worthwhile to be examined because of its negative impact on the environment and efforts in industry self-regulation (e.g. Johnson, 2012; King & Lenox, 2000). Large corporations such as BASF engage in environmental sustainable development (e.g. Saling et al., 2002; Uhlman & Saling, 2010), voluntary initiatives such as the Responsible Care Program (CEFIC, 2019) are found, and an industry-specific sustainable development index has

<sup>44</sup>Generally, the present work is concerned with corporate sustainability and not corporate responsibility (see Section 2.3.2; e.g. Bansal & Song, 2017). However, a distinction of these terms is not made in the cited reference (KPMG, 2017), and the original wording is adopted.

<sup>45</sup>Sample: 4,900 top 100 companies in terms of revenues in 49 countries, thereof corporations allocated to healthcare.

been developed (AIChE & IfS, 2019). Similar to the health economy, the chemical industry's corporate responsibility reporting rate experienced a substantial increase from 75% in 2015 to 81% in 2017 (KPMG, 2017).<sup>46</sup> In contrast, the German car industry, which is the largest industry in the manufacturing sector in terms of GVA (share of 22.6% in 2016; Destatis, 2018h), rather attracts attention with embroilment in fraud scandals on cars' true Carbon Dioxide (CO2) emission factors. A timeline of the fraud scandal can be found in, e.g. Clean Energy Wire (2019). However, the car industry earns the fourth place in global corporate responsibility reporting, with a rate of 79% in 2017 (KPMG, 2017).<sup>47</sup> The IT industry is examined due to digitalisation being a global megatrend, requiring enhanced computer programming as well as data and information services across industries and business functions (Alc´acer & Cruz-Machado, 2019). Its importance for society is also reflected by the fact that IT skills are addressed in the SDGs (SDG 4.4.1; UN, 2018). The finance industry pursues sustainable development by, for example, the implementation of a sustainable development index (i.e. the DJSI; see Section 3.3.2 and Section 4.2; e.g. RobecoSAM, 2018a) or innovative sustainable products and services (de Bettignies & L´epineux, 2009; Wiek & Weber, 2014). However, sustainable development performances of the financial industry's activities as a whole might be questionable (Wiek & Weber, 2014). In terms of corporate responsibility reporting, the financial industry decreased its rate from 75% in 2015 to 71% in 2017 (KPMG, 2017).<sup>48</sup> Last, the real estate industry is a selected branch because housing prices constantly rise since 2015 (Eurostat, 2019b), causing debates on inequalities and social justice (Dustmann, Fitzenberger & Zimmermann, 2018). Moreover, it is the biggest two-digit level industry in the service sector in terms of GVA, with a share of 15.9% in 2016 (Destatis, 2018h).

The sample does not include meso-economic objects n yet, but especially corporations are strongly encouraged to quantify their sustainable development performances as advised in this work. Corporations should benchmark their results to the results of macro-economic objects n of this sample in order to derive coordinated actions for improved sustainable development.

The next section, Section 5.2, deals with the sample's key figures x.

#### **5.2 Sustainable development key figures**

This section presents the empirical findings of the calculation steps one to three (see Section 4.3.1 to Section 4.3.3) and is structured accordingly. First, the MLSDI's key

<sup>46</sup>Sample: 4,900 top 100 companies in terms of revenues in 49 countries, thereof corporations allocated to chemicals.

<sup>47</sup>Sample: 4,900 top 100 companies in terms of revenues in 49 countries, thereof corporations allocated to automotive.

<sup>48</sup>Sample: 4,900 top 100 companies in terms of revenues in 49 countries, thereof corporations allocated to financial services.

figures x are collected, defined, and prepared in Section 5.2.1. Because the key figures x are inferred from the key indicators y (see Section 4.3.1 and Section 4.3.4), derivation of the key figures' significance in relation to sustainable development is postponed to Section 5.3.1. Second, results of the missing value imputation are exhibited and discussed in Section 5.2.2.

#### **5.2.1 Collection and preparation of sustainable development key figures**

The MLSDI's ideal set of key indicators c<sup>4</sup> is the intersection of the GRI and the SDG frameworks (see Section 4.3.4 and Section 5.3.1). From this intersection, the ideal set of key figures c<sup>5</sup> is inferred (see Section 4.3.1). The actual sets are reduced versions of the ideal sets because of macro-data restrictions by official statistics. Severe item non-response entails too high uncertainties in the imputation process, and the item is excluded from the calculation. Three different forms of severe item non-response are present: A key figure x may be totally unavailable, only available at one-digit level, or only available for several divisions with incomplete sections. The present sample comprises six environmental, 16 social, and 14 economic key figures x, with the total number of key figures X amounting to 36. The unbalanced number of available key figures x across the contentual domains might demonstrate a focus on social and economic issues. However, indicators of the environmental domain are less similar to each other (e.g. the social domain contains four tax indicators; see Table 5.3) and the main topics and impacts are covered by the relatively small number of indicators. Table 5.2 to Table 5.4 list and characterise the MLSDI's environmental, social, and economic key figures x by their statistical classifications and reporting units. Data sources are provided in the last columns of the tables.

Definitions of the key figures x are provided in the following, and if not indicated otherwise, they are compiled by definitions of their data sources and Eurostat (2019c). The environmental domain reports air emissions (see Table 5.2), which are the amount of pollution of a plant or a product released into the air and include GHG emissions according to the Kyoto protocol (UNFCCC, 1998). The value of taxes levied on physical units that negatively impact the environment is called environmental tax and involves energy taxes and transport taxes. Energy taxes are composed of the energy tax, electricity tax, emission rights, fee for the Compulsory Oil Storage Association, and the nuclear fuel tax. Transport taxes consist of the motor vehicle tax and the air traffic tax. Hazardous waste regards the amount of hazardous substances generated by primary producers that require records according to the European regulation of waste (BMJV, 2019b). Primary energy consumption is the amount of energy used in the first place, irrespective of its purpose (energy or non-energy purpose) and conversation losses or other leakages. Waste water is used water that does not fulfil the quality criteria of


**Table 5.2** List of the environmental key figures; CO2e, Carbon Dioxide Equivalents; CPA, Classification of Products by Activity; NACE, Statistical Classification of Economic Activities in the European Community

its initial purpose. The amount of water used by end users is termed water use.

The social domain's key figures x encompass the following (see Table 5.3). Apprentices are the number of employees in vocational training. The value of tax levied on taxable incomes of the economic objects n is referred to as the Corporate Income Tax (CIT). The compensation of employees represents the value of remuneration by employers to employees in return for work. It includes gross wages and salaries as well as social insurance contributions by both employers and employees. German compulsory social insurances involve the accident, health, nursing care, and the unemployment insurances. The key figure employees comprises the number of people contracted to carry out work for an employer in return for remuneration. The female labour force is constituted by the number of economically active females and includes female employees, self-employed, and unemployed women.<sup>49</sup> The number of female employees with a compensation below 450 Euro per month or a short-term contract below approximately three months of duration are termed female marginally-employed employees. Marginal employment is not subject to participation in the compulsory social insurances. In contrast, the female socially-insured employees are the number of female employees contributing to and benefiting from compulsory social insurances. The gender-unspecific counterparts labour force, marginally-employed employees, and socially-insured employees are defined correspondingly. After defining the last type of employment in the economic domain (see below), relations of the different employment types are established. The allocation of employment key figures to the social as well as the economic domain is based on the key indicators' assignment (see Section 5.3.1.2 and Section 5.3.1.3) and demonstrates the employment's dual purpose: It is a source of income but goes beyond

<sup>49</sup>As unemployed people cannot be assigned to an industry, the (female) labour force is only available for the overall German economy. Industry-specific data are not required as the labour forces only serve the computation of the key indicators y on gender differences, further elaborated in Section 5.3.1.2.


**Table 5.3** List of the social key figures; aHC, average Headcount; CIT, Corporate Income Tax; CPA, Classification of Products by Activity; n/a, not applicable; NACE, Statistical Classification of Economic Activities in the European Community; VAT, Value Added Tax

its economic purpose by being key to any successful transition (Harangozo et al., 2018). The local business tax is a local government charge and encompasses the value of tax levied on trade income of business enterprises. By computing the difference of the value of taxes levied on products and subsidies granted for products, the net taxes on products are obtained. Products may be produced or traded goods and services. The number of employees with disability status according to BMJV (2019a) build the key figure severely-disabled employees. The Value Added Tax (VAT) is the value of taxes


**Table 5.4** List of the economic key figures; aHC, average Headcount; CPA, Classification of Products by Activity; GVA, Gross Value Added; NACE, Statistical Classification of Economic Activities in the European Community; R&D, Research and Development

levied on the value added of goods and services, and is computed by the difference of total VAT and deductible VAT on inputs. The number of hours actually worked by employees (excluding, e.g. holidays and sick days) composes the working hours of employees. Workplaces for severely-disabled employees are the number of mandatory workplaces for severely-disabled employees, set by an employer's type and size.

Last, the economic domain's key figures x (see Table 5.4) are defined. Consumption of fixed capital is the value of impairment of fixed assets (see below). The value of goods and services that change ownerships from residents to non-residents is termed export. Gross fixed assets represent the reinstatement value of stock of fixed assets that are used in production for more than one year. Fixed assets include machinery, equipment, buildings, and other structures. The gross fixed capital formation refers to the value of acquisitions of fixed assets, excluding fixed asset disposals, and is therefore also termed "investment". Definitions of GVA, output, and input can be found in Section 4.3.2.1. The key figure import regards the value of goods and services that change ownerships from non-residents to residents; the imported input is defined correspondingly. The value of expenditures within a statistical unit on creative work conducted by own employees

to increase the stock and use of knowledge is reported in the internal Research and Development (R&D) expenditures. Net fixed assets regard the current value of stock of fixed assets, which is equivalent to gross fixed assets less the accumulated consumption of fixed capital. The number of employees in the field of R&D depicts the key figure R&D employees. Working hours of working population is the working population's equivalent of working hours of employees, where the working population represents the number of people that perform a production activity. The relations of apprentices, employees, labour force, marginally-employed employees, socially-insured employees, and working population are as follows. The labour force is the broadest key figure x as it comprises employees, self-employed, and unemployed people. The working population is obtained by disregarding unemployed people. Self-employed people are not further distinguished but employees are. These include apprentices, marginallyemployed employees, socially-insured employees, and further employment types such as civil servants. However, data of these key figures x are not comparable as they are retrieved from different data bases.

Key figures x are generally defined on a positive value range. Exceptions are the VAT and the net taxes on products. Positive values indicate monetary outflows from the object of investigation, and negative values denote monetary inflows to the object of investigation.

Preparation of key figures x cover macro-level transformations from NACE to CPA. These yield standard results and are thus not disclosed. Data sources for the transformation include Destatis' supply tables retrieved from the national accounts (Destatis, 2012c, 2013d, 2015g, 2016h, 2016i, 2017e, 2018i, 2019f).

The following section, Section 5.2.2, turns the incomplete sample into a complete one by missing value imputation.

#### **5.2.2 Imputation of missing values**

The collected sample (see Section 5.2.1) contains item non-responses and features a general missing data pattern. Sustainable development data remain to be scarce despite the digital era of big data that is deemed to generate richness of data and information (Esty, 2018). 17 of 36 key figures x require missing value imputation and the average rate of missing values λ amounts to 22.63%, with a minimum of 14.87% in 2013 and a maximum of 32.19% in 2008. The missing data patterns of these years are illustrated in Figure 5.1 and Figure 5.2. The x-axis contains the key figures x, while the y-axis comprises the economic objects n. Light patches signal missing data. More than twice as many values are missing in the service sector (λ = 28.13%) compared to the manufacturing sector (λ = 13.29%). In Figure 5.1 and Figure 5.2, approximately the upper half represents the manufacturing sector, and approximately the lower half depicts the service sector (see Table A.1).

**Figure 5.1** Missing data pattern in the German economy in 2008; CIT, Corporate Income Tax; GVA, Gross Value Added; NACE, Statistical Classification of Economic Activities in the European Community; R&D, Research and Development; VAT, Value Added Tax

The first application to gain upon the data shortage is single time series imputation (see Section 4.3.3.2). The imputation generally yields stable results, and exemplary results of the key figure import for the selected branches are displayed in Figure 5.3. The import's missing data pattern is monotone in the temporal dimension and thus easily visualised with solid lines for observed data and dashed lines for imputed data. The Kalman smoothing and maximum likelihood estimation (see Section 4.3.3.2; e.g. Harvey, 1989) are applied on the agricultural sector and industries in the manufacturing sector in 2008 and 2009. Industries in the service sector require modified mean imputation as their total time series are unobserved. Because single imputation produces stable results as expected, estimates are considered to be valid. Test results on the model

**Figure 5.2** Missing data pattern in the German economy in 2013; CIT, Corporate Income Tax; GVA, Gross Value Added; NACE, Statistical Classification of Economic Activities in the European Community; R&D, Research and Development; VAT, Value Added Tax

assumptions follow below.

With regard to Amelia II (see Section 4.3.3.3; e.g. Honaker et al., 2011), dropped, highly correlated key figures x that are free from missing values encompass the compensation of employees, employees, female marginally-employed employees, socially-insured employees, workplaces for severely-disabled employees, consumption of fixed capital, gross fixed assets, net fixed assets, and the output. Kendall's tau is the used correlation coefficient because the key figures x are non-normal (see below). The compensation of employees and the output correlate with the GVA and the input; female marginallyemployed employees are associated with marginally-employed employees; female sociallyinsured employees, socially-insured employees, and employees depend on the working

**Figure 5.3** Single time series imputation on import in billion Euro for the selected branches in the German economy from 2008 to 2016; solid line, observed data; dashed line, imputed data; IT, Information Technology

population; workplaces for severely-disabled employees vary along with severely-disabled employees; and the key figures on capital and assets are associated with the gross fixed capital formation. The Amelia II algorithm performs m = 23 imputations, which corresponds to a relative efficiency η of 99.03% (see Equation (4.6)). Amelia II's result of the exemplary key figure import is shown in Figure 5.4. Despite the restricting bounds to the observed range of values, missing values are heavily overestimated for industries in the service sector and moderately overestimated for several industries in the manufacturing sector. Not setting bounds would lead to even higher variances in estimates. The difference in severity of misspecification across the manufacturing and the service sectors may originate in their different rates of missing values λ (see above).

To verify the assumptions of the imputation models, statistical tests are performed (see Section 4.3.3.4). First, Little's MCAR test is intended to be executed but fails because the sample involves key figures x that are missing for an entire time period t. The key figure import in 2008 is such an example (see Figure 5.1). Whether the MAR assumption is valid remains unknown, but minor effects are expected from its violation (see Section 4.3.3.4; e.g. R¨assler et al., 2013).

For single time series imputation, the Shapiro-Wilk, Kolmogorov-Smirnov, augmented Dickey-Fuller, and the Ljung-Box tests are performed to investigate normality, stationarity, and i.i.d. of the key figures x and residuals, respectively. Results can be found in Table A.3 to Table A.5 in the Appendix A.3. The Shapiro-Wilk test statistics range from 0.1728 for waste water and 0.8082 for output. P-values are less or equal than 0.0001. The test statistics of the Kolmogorov-Smirnov test vary on an interval between

**Figure 5.4** Multiple imputation on import in billion Euro for the selected branches in the German economy from 2008 to 2016; solid line, observed data; dashed line, imputed data; IT, Information Technology

0.5 for the CIT and local business tax and one for several variables, with p-values less or equal than 0.0001. Both tests yield the same result: The null hypotheses are rejected with p-values less or equal than 0.0001. The data are non-normal. Non-normality of key figures x is confirmed by examination of histograms, such that type I errors are not expected. Exemplary histograms of import and air emissions in 2016 are displayed in Figure 5.5, visualising the key figures' typical right skewness. The augmented Dickey-Fuller test statistics range from −11.17 for the net taxes on products to −3.88 for the imported input. P-values remain below 0.01, except the imported input's p-value yields 0.0152. However, it is still below the decisional threshold of 0.05. The null hypotheses of the augmented Dickey-Fuller tests are rejected, and stationarity of the data are confirmed. The Ljung-Box test statistics' minimum of 0.0001 is obtained for the input, and the maximum of 0.5481 is achieved for the net taxes on products. All p-values exceed the threshold value 0.05, concluding that the error terms of the residuals are i.i.d. These p-values are listed in the last columns of Table A.3 to Table A.5.

Concerning multiple imputation, the multivariate Shapiro-Wilk test yields a test statistic of 0.0327 with a p-value less or equal than 0.0001. The null hypothesis is rejected, and the data are multivariate non-normal. Overdispersed start values indicate that the Amelia II algorithm functions well. Figure 5.6 illustrates the convergence of the largest PC after two imputations. The largest PC is utilised to summarise the data.

In conclusion, data are neither univariate nor multivariate normal. Single time series imputation does not appear to be distorted by the normality violation, and the Kalman filter proves to be an optimal estimator under violation of the normality assumption

**Figure 5.5** Frequency distribution of import and air emissions in the German economy in 2016; CO2e, Carbon Dioxide Equivalents

(see Section 4.3.3.2; Harvey, 1989). The inclusive hierarchy leads to relatively low uncertainty in the imputation process, and the assumption of the temporal dimension being a reliable predictor seems to be valid. In contrast, the Amelia II algorithm yields implausible results, endorsing Demirtas et al. (2008) evidence of Amelia II producing biases under non-normal, small samples. The implausible results may further confirm the supposition of cross sections to be unreliable predictors in sustainable development assessment: Economic objects n feature unique characteristics with regard to the sustainable development key figures x. Both conclusions on Amelia II's implementation are supported by the diagnostics of algorithm convergence: The algorithm is not the origin of misspecification, but the input data are.

In the following, Amelia II's results are disregarded, and the subsequent calculation is based on the singly imputed set of key figures c5. The next section, Section 5.3, addresses the sustainable development key indicators y.

#### **5.3 Sustainable development key indicators**

This section addresses results of the calculation steps four to six (see Section 4.3.4 to Section 4.3.6) and is organised correspondingly. First, the key indicators y are derived in Section 5.3.1, and results of the growth indicators y<sup>g</sup> are outlined in Section 5.3.2. Empirical findings of the ratio indicators y<sup>r</sup> are not presented because they are reported in diverse units (see Table 5.2 to Table 5.4), such that results are not comparable before scaling (see Section 4.3.6). Outlying key indicators y<sup>o</sup> are removed in Section 5.3.3, and last, cleaned and rescaled key indicators' summary statistics as well as data results of the selected branches are exhibited and analysed in Section 5.3.4.

**Figure 5.6** Convergence of the Amelia II algorithm with overdispersed start values for the largest Principal Component (PC)

#### **5.3.1 Alignment of the Global Reporting Initiative (GRI) and the Sustainable Development Goal (SDG) disclosures**

Based on GRI and UNGC (2018a), this section aligns the meso GRI disclosures with the macro SDG indicators and targets and adjusts the alignment to the MLSDI's key figures x and the key indicators y. Detailed information about the GRI disclosures and the SDG indicators and targets are retrieved from GRI (2016); and UN (2018). The economic domain is further supported by IASB (2018). Hereafter, when referring to both a SDG indicator and a SDG target, the term "SDG disclosure" is used. Because of methodological shortcomings or data restrictions by official statistics (see Section 5.2.1), the alignment is bounded, and adjustments are made. For example, GVA instead of revenue is used as a standardising key figure xstd, or data of a similar variable are acquired. The following sections, Section 5.3.1.1 to Section 5.3.1.3, address the resulting key indicators y by the contentual domains.

#### **5.3.1.1 Environmental sustainable development key indicators**

The environmental domain's GRI and SDG disclosures are mainly concerned with the reduction of absolute negative environmental impacts (i.e. increase of effectiveness) and the reduction of environmental intensities (i.e. increase of efficiency). The latter is achieved by relative decoupling of economic activity and environmental degradation. Environmental key indicators y generally affect sustainable development performances negatively. One exception is indicated below. Table 5.5 shows the MLSDI's environmental key indicators y, their effective directions ξ, and reporting units. Ratio indicators' calculation schemes are indicated, whereas the growth indicators' formula


**Table 5.5** Environmental key indicators and their characterisation; CO2e, Carbon Dioxide Equivalents; GVA, Gross Value Added

can be found in Equation (4.9).

As a first topic, air pollution is covered, which is addressed in several GRI and SDG disclosures. Air pollution leads to climate change (Rockstr¨om et al., 2009b), a planetary boundary that has been transgressed (see Section 2.2.1; e.g. Steffen et al., 2015). Therefore, there is an urgent need to measure and manage air pollution. Substances into the air should be reduced (SDG 12.4), impacts of ocean acidification ought to be minimised (SDG 14.3), and forests are required to be managed sustainably (SDG 15.2). From a societal perspective, reduction of deaths and illnesses from air pollution should be aimed at (SDG 3.9), and resilience to climate related hazards is required to be strengthened (SDG 13.1). Contributing to the management of these targets, the MLSDI collects data of the key figure air emissions (GRI 305-1) and computes the key indicators growth of air emissions (GRI 305-5)<sup>50</sup> and air emissions intensity (GRI 305-4; SDG 8.4; SDG 9.4.1). The latter is obtained by the ratio of air emissions and GVA (see Table 5.5), specifying the amount of emissions in gram Carbon Dioxide Equivalents (CO2e) released into the air per Euro of generated GVA. A reduction of this

<sup>50</sup>The GRI disclosure 305-5 comprises reduction of air emissions. However, as data of the key figure air emissions are collected, its growth rate is computed, and its effective direction ξ is accounted for in the scaling procedure (see Section 4.3.6.2 and Section 5.3.4). This case occurs for further key indicators y but is not pointed out repetitively.

ratio indicator y<sup>r</sup> implies a successful relative decoupling of environmental degradation in terms of air emissions and economic activity measured by GVA. All ratio indicators y<sup>r</sup> that are labelled with "intensity" operate in this fashion. Data on GVA are collected in the economic domain (see below).

A major cause of air emissions is energy consumption as its supply mainly relies on air-polluting technologies (Destatis, 2018f; EEA, 2018). To further support the SDG targets 8.4 and 13.1, natural resources for energy consumption should be managed sustainably and efficiently (SDG 12.2). For this purpose, data on primary energy consumption are acquired, and the key indicators growth of primary energy consumption (GRI 302-4) and energy intensity (GRI 302-3; SDG 7.3.1; SDG 8.4) are encompassed in the MLSDI.

A further natural resource to be managed sustainably and efficiently (SDG 12.2) is water. The planetary boundary freshwater use is currently in the safe zone and has not been crossed (see Section 2.2.1; e.g. Steffen et al., 2015). For prevalence of this status, economic objects n should contribute to the improvement of water quality (SDG 6.3), protection of water-related ecosystems (SDG 6.6), and reduction of water pollution (SDG 12.4; SDG 14.1). Moreover, similar to air pollution, deaths and illnesses from water contamination ought to be minimised (SDG 3.9). Both key figures water use and waste water add to the meso-to-macro comparable measurement of these targets with their growth indicators growth of water use (GRI 303-1) and growth of waste water (GRI 306-1) as well as their ratio indicators water intensity (SDG 6.4.1; SDG 8.4) and waste water intensity (SDG 8.4).

Waste is another source of pollution, and especially hazardous waste should be assessed and managed (SDG 12.4). The key figure hazardous waste (GRI 306-2) results in the key indicators growth of hazardous waste (SDG 12.5) and hazardous waste intensity (SDG 8.4; SDG 12.4.2).

The last included topic of the environmental domain are taxation matters. Generally, fiscal policies should be adopted for greater equality (SDG 10.4), and in particular, environmental harmful subsidies should be phrased out (SDG 12.c.1). The polluter pays principle should be implemented, which was already a subject in the 1970s (UNCHE, 1972; WCED, 1987). Data on environmental tax are collected to compute the key indicator environmental tax intensity (SDG 12.c.1). This key indicator y features the exceptional positive effective direction as paying up environmental damages positively impacts environmental protection. The relation to GVA is not optimal but standardising by the environmental damage in physical units would be. Necessary to this end would be an aggregation of the diverse physical units arising from the multiple tax bases (see Section 5.2.1). The aggregation in turn would require a scaling procedure such as the scaling of the key indicators y (see Section 4.3.6). For rectilinearity, GVA is chosen as the standardising key figure xstd, implying that high value-generating economic objects n should channel financial resources for environmental protection. Furthermore, growth

of environmental tax is not computed because it would not indicate the effectiveness of the taxation system but an increase in the tax bases and environmentally-damaging consumption. Evaluation of a taxation system's effectiveness is complex and typically investigated with computable general equilibrium models (e.g. Bergman, 2005). Research on environmental tax's effectiveness and relation to sustainable development can be found in, e.g. Bosquet (2000); R. E. L´opez and Figueroa (2016); and Morley (2012).

The social key indicators y are determined in the following section, Section 5.3.1.2.

#### **5.3.1.2 Social sustainable development key indicators**

Main topics of the social domain's intersection of the meso GRI and the macro SDG disclosures are income and employment. Employment is more than a source of income (see Section 5.2.1; Harangozo et al., 2018), and both income and employment are key for life above the social boundaries (see Section 2.2.2; e.g. Raworth, 2012). However, social boundaries are not as well developed as the planetary boundaries are. The current framework is not universal but rather applicable to the developing than the developed world (see Section 2.2.2; Raworth, 2017). As the investigated geographical region is Germany, one of the seven major economies of the world (UN, 2019c), the social boundaries are disregarded, and only the GRI and the SDG disclosures are relied on. Social key indicators y generally feature a positive effective direction ξ<sup>+</sup> (see Table 5.6), and negatively affecting key indicators y are explicitly emphasised.

The first target to be covered by meso-economic and macro-economic objects n is poverty reduction (SDG 1.2), entailing the target full employment and decent work for all (SDG 8.5). Assessing contributions to these targets, the key figures compensation of employees (SDG 10.1) and employees (GRI 102-8) are acquired for computing the following growth indicators y<sup>g</sup> and ratio indicators yr: the growth of compensation of employees (SDG 10.1.1), growth of employees, average compensation of employees p.c., average compensation of employees per hour (p.h.) (SDG 8.5.1), and the labour share (SDG 10.4.1). The average compensations of employees are obtained by standardising the compensations of employees to the employees and, their working hours, respectively (see Table 5.6), alluding to an employee's average purchasing power. Employees are measured in headcount, including both part-time as well as full-time employees. This imprecision causes a distortion but cannot be avoided because data on employees in full-time equivalents are unavailable at two-digit level. The labour share provides information on the proportion of GVA granted to employees (see Table 5.6). Growth of working hours of employees is not computed. It is an accumulated measure that does not unfold information on the number of hours worked per employee per day or per week. Hours per employee per day or per week is a meso sustainable development disclosure (GRI 102-17), but macro data are not available.

In further achieving poverty reduction (SDG 1.2), social protection systems should



**Table 5.6** Social key indicators and their characterisation; CIT, Corporate Income Tax; GVA, Gross Value Added; p.c., per capita; p.h., per hour; VAT, Value Added Tax; Wp, Workplaces

be in force (SDG 1.3). Hence, the key figures socially-insured employees and marginallyemployed employees are gathered. Their growth indicators y<sup>g</sup> comprise the growth of socially-insured employees and the growth of marginally-employed employees, with the resulting ratio indicator share of marginally-employed employees (SDG 1.3.1). The effective directions ξ of the growth and the share of marginally-employed employees are negative: Marginally-employed employees are not covered by social security systems (see Section 5.2.1; BA, 2019), and thus, employees should be prevented from this type of employment.

Supporting SDG 10.2 and SDG 10.3 on inclusion and equal opportunities, discrimination against all women and girls should be ended (SDG 5.1). Assessing meso-economic and macro-economic objects' contributions to these targets, data on female sociallyinsured employees, (female) labour force, and female marginally-employed employees are collected. Growth indicators y<sup>g</sup> encompass the growth of female socially-insured employees and the growth of female marginally-employed employees. Because the (female) labour force is composed of the working population and unemployed people, its growth rate is only meaningful for overall economies and hence not implied in the MLSDI. Ratio indicators y<sup>r</sup> are the quota of gender difference (SDG 16.7.1) and the quota of gender difference of marginally-employed employees (SDG 1.3.1). Calculation schemes of the quotas of gender differences are displayed in Table 5.6. The first parts of the differences represent the status of employment by gender in percentage. The second parts of the differences indicate possibilities of employment by gender with regard to the first parts of the equations: The share of female labour force represents the population of the share of female socially-insured employees, and the share of socially-insured employees constitute the population of the share of female marginally-employed employees. Because equality is aimed at (SDG 10.2; SDG 10.3), neither men nor women should be privileged, and absolute values are taken. Moreover, striving for equality, an increase of the quotas of gender differences degrade social development, such that their effective directions ξ are negative.

Continuing to operationalise empowerment and equal opportunities for all (SDG

10.2; SDG 10.3), the key figures severely-disabled employees and workplaces for severelydisabled employees are gathered. The growth of severely-disabled employees and the quota of severely-disabled employees (SDG 16.7.1) are computed to measure mesoeconomic and macro-economic objects' contributions to these targets. Growth of workplaces for severely-disabled employees is not calculated because these workplaces depend on the type and the size of an employer (see Section 5.2.1; BA, 2018). This fixed calculation scheme prevents individual performances, and the key figure workplaces for severely-disabled employees only serves standardisation.

Equal access to vocational education (SDG 4.3) and the increase in number of youths and adults who possess vocational skills (SDG 4.4) should be endeavoured. The key figure apprentices is gathered, and its key indicators growth of apprentices and share of apprentices are computed to assess meso and macro contributions to the aforementioned targets. The share of apprentices is the proportion of apprentices in socially-insured employees (see Table 5.6).

Fiscal instruments are demanded for reaching social development (SDG 10.4). The data collection for this target results in the key figures VAT (GRI 201-1), net taxes on products (GRI 201-1), CIT (GRI 201-1), and local business tax (GRI 201-1). Their ratio indicators y<sup>r</sup> are intensities (see Table 5.6) that state the share of GVA passed to the government. These taxes' growth indicators y<sup>g</sup> are excluded from the MLSDI because, similar to the environmental tax, their growth would not reveal effectiveness of the taxation system but an increase in the tax bases and economic activity, which is not part of sustainable development (see Section 2.2.3; e.g. Jackson, 2009).

The next section, Section 5.3.1.3, derives the MLSDI's economic key indicators y.

#### **5.3.1.3 Economic sustainable development key indicators**

The economic domain's alignment of GRI and SDG disclosures results in key indicators y that mainly strive for economic productivity. Enhancements of economic key indicators y imply improved sustainable development performances. Their effective directions ξ are positive. Because economic growth is only required to eliminate poverty (see Section 2.2.3; e.g. WCED, 1987), and Germany is one of the seven major economies of the world (see Section 5.3.1.2; UN, 2019c), economic growth indicators y<sup>g</sup> are disregarded for the present sample. One exemption is made: The growth of working population is investigated as it contributes – jointly with the key indicators y on employment of the social domain – to the achievement of full and productive employment as well as decent work for all (SDG 8.5; see Section 5.3.1.2).

To increase economic productivities, technological upgrading should be accomplished (SDG 8.2). To this end, the key figures gross fixed assets (IAS 16.73d), net fixed assets (IAS 1.54), consumption of fixed capital (IAS 1.102; IAS 1.103; IAS 1.104), and gross fixed capital formation (IAS 7.21) are collected. The MLSDI's resulting ratio indicators


**Table 5.7** Economic key indicators and their characterisation; GVA, Gross Value Added; p.c., per capita; p.h., per hour; R&D, Research and Development; WP, Working Population

y<sup>r</sup> read: gross capital productivity, net capital productivity, degree of modernity, consumed capital productivity, and investment intensity. The gross capital productivity indicates the value of the factor input gross fixed assets to realise GVA (see Table 5.7). The other productivity indicators of the economic domain function analogically. The gross fixed capital formation's ratio indicator y<sup>r</sup> is an intensity. The degree of modernity is the ratio of net and gross fixed assets, shedding light on the process of ageing as it represents the share of fixed assets that has not been consumed (Schmalwasser & Weber, 2012).

Targeting productivity through innovation (SDG 8.2), data on internal R&D ex-

penditures (IAS 38.126; IAS 38.127) and R&D employees are collected. Technological knowledge may result in future economic benefits (IASB, 2018). The computed ratio indicators y<sup>r</sup> are internal R&D intensity (SDG 9.5.1) and share of R&D employees (SDG 9.5.2). Because R&D is an investment (Schmalwasser & Weber, 2012), its intensity instead of productivity is computed.

Additionally, SDG 8.2 suggests emphasising high value-added sectors. The key figure output is gathered for the computation of the GVA rate, which states the proportion of GVA in the output.

Labour-intensive sectors should be focused to achieve higher levels of economic productivity (SDG 8.2). The key figures working population and working hours of working population are collected to compute the key indicators growth of working population (SDG 8.5; see Section 5.3.1.2), labour productivity p.c. (SDG 8.2.1), and labour productivity p.h.

As a last topic of the economic domain, international trade is considered. To strengthen developing countries, reduction of poverty (SDG 1.2; see Section 5.3.1.2) and enablement of decent work (SDG 8.5; see Section 5.3.1.2) should be targeted by significantly increasing exports of these countries (SDG 17.11). From Germany's point of view, imports from developing countries should be augmented because Germany is one of the major seven world economies (see Section 5.3.1.2; UN, 2019c). The key figures import, export, and imported input are collected to calculate the following ratio indicators yr: net import intensity and share of imported input. Their calculation schemes are indicated in Table 5.7.

To sum up, the MLSDI comprises several ratio indicators y<sup>r</sup> and several growth indicators y<sup>g</sup> to map efficiency and effectiveness. From the 36 acquired key figures x, 30 ratio indicators y<sup>r</sup> are computed of which six belong to the environmental, 12 to the social, and another 12 to the economic domain. The total number of growth indicators Y<sup>g</sup> amounts to 14, with five environmental, eight social, and one economic growth indicator yg. The number of ratio indicators Y<sup>r</sup> and growth indicators Y<sup>g</sup> as well as the number of key indicators Y across the contentual domains are unbalanced. The environmental domain contains 11, the social domain is built by 20, and the economic domain consists of 13 key indicators y. The total number of key indicators Y amounts to 44. Due to limitations on data availabilities for economic objects n at two-digit level, several topics could not be included in the MLSDI.

Concluding on Section 5.3.1.1 to Section 5.3.1.3, several SDG targets are repetitively stated and measured by more than one key indicator y. Moreover, SDG targets do not always follow their goals' assignment to the contentual domains (e.g. Folke et al., 2016). For instance, a target that belongs to a social goal might be assigned to the environmental domain. Especially the environmental domain connects all three domains: Environmental efficiency regards the environmental and the economic domains, and health-related issues caused by environmental degradation concern the environmental

and the social domains. Other examples have been provided in Section 2.2.4. These findings verify the interconnectedness of the goals and strengthen the assessment principle synergies and trade-offs (see Section 3.1; e.g. Costanza, Fioramonti & Kubiszewski, 2016) to be tackled by the MLSDI's weighting procedure (see Section 4.3.7 and Section 5.4).

The following section, Section 5.3.2, describes and analyses the summary statistics of the growth indicators yg.

#### **5.3.2 Summary statistics of the sustainable development growth indicators**

At this stage of the calculation, the key indicators y are unscaled and not comparable to each other. However, growth indicators y<sup>g</sup> are uniformly reported in percentages, and their empirical results reveal greater insights when unscaled: Their signs indicate the direction of change. The direction of change is desired to be in line with the effective direction ξ. For example, positively affecting key indicators y are desired to exhibit positive growth rates. Rescaled growth indicators ygs trade this straightforwardness for comparability to rescaled ratio indicators yrs (see Section 4.3.6.2) and subsequent aggregation (see Section 4.3.8). Therefore, summary statistics of the unscaled growth indicators y<sup>g</sup> are analysed in this section before the scaling procedure. Outlying key indicators y<sup>o</sup> are untreated, but conclusions of this analysis remain valid as growth indicators y<sup>g</sup> are characterised by a relatively low outlier rate β (see Section 5.3.3). Full summary statistics of both the unscaled growth indicators y<sup>g</sup> and the unscaled ratio indicators y<sup>r</sup> are provided in the Appendix A.4, Table A.6 to Table A.8.

Summary statistics classify a distribution according to its centre, spread or dispersion, and frequency. Central measures to be analysed are the mean and median. High central measures are endeavoured for key indicators y that feature a positive effective direction ξ<sup>+</sup>. Common measures of dispersion are the standard deviation, median absolute deviation, minimum, maximum, and the 25th and the 75th percentiles. Neither the standard deviation nor the median absolute deviation are included in the analysis because deviations from central measures are not crucial in sustainable development assessment, but deviations from targets should be quantified (see Section 3.1; e.g. Sala et al., 2015). Owing to lacks in data, targets could not be included but are replaced by distributional minima and maxima (see Section 4.3.6.2). Changes in the extremes signal alteration in the performance of the worst and the best economic object n, respectively. If a key indicator's effective direction ξ is positive, an increase in the extremes is desired. The 25th and the 75th percentiles are of interest in order to localise the interior 50% of the distribution. Analysed frequency measures are skewness and kurtosis. The relation of the mean and the median raise expectations about the skewness. If the mean exceeds the median, frequent values occur at the bottom, such that the distribution is positively (right) skewed. Vice versa, if the median surpasses the mean, frequent


**Table 5.8** Summary statistics of the environmental growth indicators in the German economy from 2008 to 2016; Max, Maximum; Min, Minimum; Q1, 25th percentile; Q3, 75th percentile

scores are located at the top, entailing negative (left) skewness. These rules on resulting skewness hold true in most but not all cases (von Hippel, 2005). A distribution is fairly symmetrical if absolute skewness remains below 0.5. Moderate skewness ranges between absolute values of 0.5 to 1.0, and distributions with absolute skewness higher than 1.0 are highly skewed (Bulmer, 1979). Negatively skewed distributions are favourable for sustainable development. In this case, light tails (negative kurtosis) are desired because the tail refers to the bottom of the distribution. The opposite is preferred for positively skewed distributions, such that the kurtosis is ambiguous for sustainable development. A distribution is platykurtic (light tails) for kurtosis values below −2.0, mesokurtic (normal) for scores between −2.0 and 2.0, and leptokurtic (heavy tails) for values above 2.0 (George & Mallery, 2005). If sustainability is reached, the distribution of key indicators y will be non-normal (Schmidt-Traub et al., 2017b). All statements can be reverted for key indicators y that have a negative effect on sustainable development.

Summary statistics of the growth indicators y<sup>g</sup> are provided in Table 5.8 for the environmental domain and in Table 5.9 for the social and the economic domains. Central measures of the environmental domain's growth indicators y<sup>g</sup> are negative (see Table 5.8). Given their negative effective direction ξ−, this finding is desirable with regard to improved environmental effectiveness, supporting a variety of SDGs (see Section 5.3.1.1). The lowest negative growth rate is obtained for the median growth of hazardous waste. Median hazardous waste reduced by −9.69% from 2008 to 2016. Its mean amounts to −6.07%. Moreover, growth of hazardous waste are highly positively skewed (skewness of 1.20) and leptokurtic (kurtosis of 2.63). Asymmetry is directed towards the bottom (in favour of environmental protection), but frequent observations occur in the tails, which approach the top of the distribution. The other key indicators'


**Table 5.9** Summary statistics of the social and economic growth indicators in the German economy from 2008 to 2016; Max, Maximum; Min, Minimum; Q1, 25th percentile; Q3, 75th percentile

growth rates feature moderate positive skewnesses and are mesokurtic. Growth of waste water is an exception as it is moderately skewed to the left and leptokurtic (undesired).

Results of the central measures of the growth indicators y<sup>g</sup> of the social and the economic domains follow their effective directions ξ (see Table 5.9), contributing to effective achievement of the respective SDGs (see Section 5.3.1.2 and Section 5.3.1.3). Only the growth of apprentices is not in line with this finding, and its mean and median are negative with values amounting to −9.50% and −8.89%, respectively. The SDG target to increase the number of people with vocational skills (SDG 4.4) is missed, exacerbating the shortage of future skilled workers, which are already missing today (e.g. Bonin, 2019). The growth of severely-disabled employees experiences the lowest minimum (−80.48%) and highest maximum (106.93%). Skewnesses and kurtoses of the social growth indicators y<sup>g</sup> are mostly negligible and close to be normal. Growth of severely-disabled employees and apprentices are exceptions with leptokurtic distributions; their kurtoses amount to 2.37 and 2.12, respectively. Because skewness is negligible, frequent values occur at both the bottom and the top. Bottom results are desired to be shifted towards the top.

**Figure 5.7** Outliers of the air emissions intensity in gram Carbon Dioxide Equivalents (CO2e) in the German economy from 2008 to 2016

The following section, Section 5.3.3, detects and removes outlying key indicators yo.

**(b)** Boxplot and quartiles

#### **5.3.3 Outlier detection and treatment**

**(a)** Histogram and frequency distribution

Outlier rates β and degrees of outlyingness are diverse across the three contentual domains and across ratio indicators y<sup>r</sup> and growth indicators yg. The environmental domain suffers most from outlyingness, with an outlier rate β of 10.77% and very strong outlying key indicators y<sup>o</sup> especially for ratio indicators yr. The economic domain exhibits an outlier rate β of 8.66% and diverse degrees of outlyingness, ranging from none (e.g. GVA rate), weak (e.g. share of imported input), moderate (e.g. investment intensity), to strong (e.g. labour productivity p.h.). The social domain's outlier rate β is the lowest (3.09%), and outlyingness is weak. The outlier rate β of ratio indicators y<sup>r</sup> is more than twice as high as the growth indicators' outlier rate β: 8.06% vs. 3.34%.

Outlier illustration in histograms as displayed earlier in Figure 5.5 may assist outliers' visual analysis. However, boxplots are more valuable in this context because they picture the IQR method. Boxes indicate the IQR q, whiskers denote the product of the outlier coefficient α and the IQR q, and outliers are expressed by circles. An exemplary histogram and boxplot of the key indicator air emissions intensity are shown in Figure 5.7a and Figure 5.7b, respectively. The key indicator air emissions intensity is chosen due to its exemplariness of the environmental domain. The distribution of the air emissions intensity is positively skewed (average skewness of 3.16; see Table A.6), and numerous outlying key indicators y<sup>o</sup> exist at the top of the distribution. The mean equals 665.35 gCO2e per Euro, while the median only reaches 65.94 gCO2e per Euro from 2008 to 2016 (see Table A.6). This finding demonstrates the effect of masking

**Figure 5.8** Outliers of the share of imported input in percentage of input in the German economy from 2008 to 2016

and inappropriateness of the mean and measures based on it for outlier detection (see Section 4.3.5.2; Field, 2009): The vast number of outlying key indicators y<sup>o</sup> at the top influence the mean to a degree that it exceeds the upper outlier threshold θmax equal to 619.63 gCO2e per Euro (see Table A.9). Outlier thresholds of each key indicator y can be found in Table A.9 to Table A.11 in the Appendix A.5.

As a further example, the share of imported input and its weak outlyingness are chosen (see Figure 5.8). Because outlying key indicators y<sup>o</sup> are weaker and fewer in number compared to the air emissions intensity, the box of the boxplot is larger, and whiskers are longer (see Figure 5.8b). Given the weakness of outlyingness, the mean is close to the median, not approaching the outlier thresholds θ (see Figure 5.8a).

In both examples, outlying key indicators y<sup>o</sup> occur at the top of the distribution. However, outlying key indicators y<sup>o</sup> at the bottom occur for the key indicators share of apprentices, VAT intensity, intensity of net taxes on products and net import intensity. Therefore, a two-sided outlier treatment is required.

After replacing outlying key indicators y<sup>o</sup> with the respective thresholds θ (see Table A.9 to Table A.11), key indicators y are rescaled and described along with their empirical findings in the following section, Section 5.3.4.

#### **5.3.4 Empirical findings of the cleaned and rescaled sustainable development key indicators**

The rescaled key indicators y<sup>s</sup> feature positive effective directions ξ<sup>+</sup> (see Section 4.3.6.2) and are free from missing values and outliers (see Figure 4.1). Key indicators y with a positive effective direction ξ<sup>+</sup> retain their labels after scaling, while negatively affecting


**Table 5.10** Denotation of negatively affecting key indicators before and after scaling

key indicators' notations change. Negatively affecting growth indicators y<sup>g</sup> alter their denotation from "growth" to "reduction", environmental ratio indicators y<sup>r</sup> except the environmental tax intensity are now reported as efficiencies, the share of marginallyemployed employees is interpreted as non-marginally-employed employees, and gender differences are translated into gender equalities. The labels are compared in Table 5.10.

The empirical findings of the cleaned and rescaled key indicators y<sup>s</sup> are analysed in two manners: Summary statistics are investigated in Section 5.3.4.1, and the selected branches (see Table 5.1) are analysed in Section 5.3.4.2. The evaluation of the performance scores follows Prescott-Allen (2001; see Section 4.3.6.2). Scores should be at least fair to be acceptable. Bad results require actions for improvements.

#### **5.3.4.1 Summary statistics**

Interpretations of the summary statistics towards sustainable development (see Section 5.3.2) remain valid for rescaled performance scores with one additional aspect: If a key indicator's score of the 25th or the 75th percentile is higher than 25.00 (poor

performance) or 75.00 (fair performance), respectively, it approximately contributes more to sustainable development than a normally-distributed key indicator y would.<sup>51</sup> Therefore, scores exceeding 25.00 and 75.00, respectively, are strived for.

Results of the rescaled growth indicators ygs of the environmental domain (see Table 5.11) are in line with their unscaled counterparts analysed in Section 5.3.2. Distributional properties among the rescaled environmental growth indicators ygs are relatively homogeneous. The economic objects n exhibit a medium central (mean and median) performance of environmental effectiveness. Only the median reduction of air emissions and both mean and median reduction of hazardous waste score fair results. The outstanding median reduction of hazardous waste of −9.69% (see Section 5.3.2) is converted into a score of 67.04, a fair and acceptable performance. Rescaled environmental ratio indicators yrs yield fair mean performances and good median performances (see Table 5.11). Central measures generally show stable, increasing trends. This is a positive finding for environmental efficiency as relative decoupling of environmental degradation and economic activity (SDG 8.4) is centrally achieved. The biggest increase in central environmental efficiency occurs for the hazardous waste efficiency's mean: It increased from 72.60 in 2008 to 77.67 in 2016, which corresponds to a growth rate of 6.98%. Because the median only increased by 2.40%, it is supposed that the mean's increase is caused by few economic objects n. Enhancements by further economic objects n are desirable. The improvement of the hazardous waste efficiency is followed by the waste water efficiency's mean, which grew by 6.57% from 2008 to 2016. Concerning the 25th and the 75th percentiles, 50% of the distribution is shifted upward by one bracket: Instead of the normal poor to fair performances, at least medium to good performances are reached by 50% of the distribution, respectively. As a result, the distributions are mostly highly negatively skewed. Kurtoses are mostly negative but relatively small and negligible. Not in favour of environmental protection are the extremes as they are nearly invariant over time without improvements. Constant extremes appear due to outlier treatment. The environmental tax intensity is an exception to these findings. Its central outcomes are poor to medium, the 25th and the 75th percentiles are below those of a normal distribution, and the data are highly positively skewed. Improvements over time are reported but insignificant.

Rescaled key indicators y<sup>s</sup> of the social domain are more diverse than those of the environmental domain (see Table 5.12). Central measures feature wider ranges (poor to good), their trends are increasing as well as decreasing, and skewnesses and kurtoses are both positive and negative. The rescaled growth indicators ygs of the compensation of employees and employees achieve medium results with a rather normal shape. Average compensations of employees (p.c. and p.h.) exhibit central performances at the lower end of being medium. However, the average compensations of employees' minima

<sup>51</sup>The contribution is only "approximately" higher because key indicators y are not rescaled on an interval from zero to 100 but ten to 100.



**Table 5.11** Summary statistics of the rescaled environmental key indicators in the German economy from 2008 to 2016; Max, Maximum; Min, Minimum; Q1, 25th percentile; Q3, 75th percentile





**Table 5.12** Summary statistics of the rescaled social key indicators in the German economy from 2008 to 2016; CIT, Corporate Income Tax; Max, Maximum; Min, Minimum; p.c., per capita; p.h., per hour; Q1, 25th percentile; Q3, 75th percentile; VAT, Value Added Tax

experienced increases of 47.13% and 83.98%, respectively, and the 25th percentiles advanced by 28.62% and 43.00%, respectively, from 2008 to 2016. Contributions to the SDG target of sustaining income growth of the bottom 40% are made (SDG 10.1). The labour share yields higher performances with mostly fair central measures. The IQR is reduced over time, and the GVA distributed to employees becomes more homogeneous across economic objects n. Central reduction of marginally-employed employees as well as the mean share of non-marginally-employed employees achieve fair results. Additionally, the latter rescaled key indicator y<sup>s</sup> steadily increases its performance, and its medians are appraised with good. The 25th and the 75th percentiles exceed those of a normal distribution, and 75% of the economic objects n perform at least medium (58.61 in 2008). This leads to a highly negatively skewed distribution with skewness amounting to −1.26 in 2016. This is favourable for implementation of social protection systems (SDG 1.3). Economic objects n perform fair in central rescaled growth indicators ygs on female employees and the quota of gender equality of marginally-employed employees. In contrast, medium results are reported for the quota of gender equality. The quota of gender equality should be enhanced. However, marginal employment is more critical in view of social development as social protection is not provided (see Section 5.3.2; BA, 2019). Furthermore, the improvement of the minima of the quota of gender equality of marginally-employed employees is remarkable: It enhanced from 10.00 in 2008 to 39.89 in 2016. This improvement is attributable to the divisions 17 Manufacture of paper and paper products and 93 Sports activities and amusement and recreation activities (see Table A.1). Such results are also desirable for the quota of gender equality. The growth rate and the quota of severely-disabled employees report medium central performances, with the quota experiencing a positive evolution over time. Their 25th percentiles exceed scores of 25.00, but the 75th percentiles remain below 75.00. Improvements in inclusion and equal opportunities for all are demanded (SDG 10.2). Unscaled growth rates of

apprentices are negative (see Table 5.9) and are translated into a mean and median score of 55.59 and 55.76, respectively. Both results are classified as medium performances, requiring improvements. Apprentices' ratio indicator, share of apprentices, exhibits a negative trend (mean reduction of <sup>−</sup>15.56% from 2008 to 2016), and the 75th percentiles remain below the normal 75.00 after the first year of reporting. The aggravation of skilled workers' shortage revealed by the growth of apprentices (see Section 5.3.2) is confirmed with the ratio indicator share of apprentices. Rescaled social ratio indicators yrs on taxes score poor to medium central results. Owing to outlier treatment, the minima and the maxima are mostly constant. The VAT intensity varies over time, but the trend is not steady. The 75th percentiles of the tax intensities remain below 75.00, and the distributions are positively skewed. Concluding, contributions to fiscal policies for greater equality (SDG 10.4) should be upgraded.

Summary statistics of the economic domain's rescaled key indicators y<sup>s</sup> are shown in Table 5.13. Similar to the social domain, distributional properties of rescaled economic key indicators y<sup>s</sup> diverge. The capital productivities and the investment intensity yield poor to medium central measures, ranging between 28.28 (median gross capital productivity in 2009) and 43.15 (mean net capital productivity in 2016). The extremes neither experience significant evolution at the bottom nor at the top. The 25th and the 75th percentiles are mostly located below normal percentiles, and the distributions are moderately to highly skewed to the right. The degree of modernity performs better and mostly achieves fair central scores. However, its trend is decreasing with a mean reduction of −5.49% from 2008 to 2016. A decreasing trend is also observed in its maxima. These diminished from 100.00 in 2010 to 88.81 in 2016. During the same period, the minima advanced from 10.00 to 20.78 (107.76%), entering the bracket of poor performance. Enhancement of economic productivity through technology (SDG 8.2) is realised only by bottom performers for the degree of modernity. In respect of innovation triggered by R&D activities, economic productivity is neither tackled. Performances of the internal R&D intensity and the share of R&D employees are bad (median) to poor (mean). The 75th percentiles remain below 50.00 (medium instead of normal fair performance), and the distributions are highly skewed to the right, which is unfavourable for economic sustainable development. GVA rates achieve medium central results and a positive incline of 82.70% in its minima from 2008 to 2016. However, the worst performer's growth is accompanied by a reduction of the best performer (−10.89%). Labour productivities yield poor to medium central scores and feature increasing trends in the minima. However, in this case, the advancement of the minima is not associated with a deterioration of the best performer. Undesired positive skewness is present, signalling frequent values at the bottom. Performances supporting economic productivity through GVA-intensive and labour-productive activities (SDG 8.2) should be improved. Rescaled ratio indicators yrs on trade yield poor (central share of imported input) to fair (median net import intensity) scores. The 25th and the 75th percentiles





**Table 5.13** Summary statistics of the rescaled economic key indicators in the German economy from 2008 to 2016; GVA, Gross Value Added; Max, Maximum; Min, Minimum; p.c., per capita; p.h., per hour; Q1, 25th percentile; Q3, 75th percentile; R&D, Research and Development

of the share of imported input are located below normal percentiles, such that the distribution is heavily skewed to the right. Contributions to international trade (SDG 17.11) should be advanced.

Concluding, efficiency and effectiveness gains are present in the environmental domain. Rescaled ratio indicators yrs, which map efficiencies, reach fair to good central scores, but effectiveness gains should be enhanced from medium to at least fair performances. Moreover, environmental fiscal policies could be tightened as deficiency payments for environmental damages only yield medium central performances. In the social domain, the sample exhibits mostly medium performances for both efficiency and effectiveness. Improvements are desired with the exception of rescaled key indicators y<sup>s</sup> that depict social security protection. These yield fair performances for both efficiency and effectiveness. Rescaled economic key indicators y<sup>s</sup> paint a bleak picture, with desired upgrading in economic productivity. Note that key indicators y of the economic domain focus on productivities and investments, which are part of sustainable development. Economic growth is not represented in the economic domain because it is not key to sustainable development (see Section 2.2.3; e.g. Vermeulen, 2018).

After analysing the summary statistics of the rescaled key indicators ys, the next section, Section 5.3.4.2, deals with the rescaled key indicators' results of the selected branches (see Table 5.1).

#### **5.3.4.2 Comparative analysis of the selected branches**

The comparative analysis of the selected branches (see Table 5.1) conducted in this section is structured according to the three contentual domains. Efficiency and effectiveness of sustainable development contributions by the selected branches are first evaluated for the environmental domain, followed by the social and the economic domains. Rescaled ratio indicators' results refer to the last year of observation (i.e. 2016), whereas rescaled growth indicators ygs refer to changes from 2008 to 2016. Because the ratio indicators' trends are stable over time (see Table 5.11 to Table 5.13), results from 2016 are representative for the entire time horizon.

The fair to good central scores and the high negative skewnesses of the rescaled environmental ratio indicators' distributions (see Section 5.3.4.1) are generally reflected by the selected branches (see Figure 5.9): Most selected branches are clustered at the outskirts of the radar chart and yield good performances. Only few economic objects n are located at the interior, scoring bad to poor performances. Industries in the service sector are environmentally efficient, thus obtaining low scores in the environmental tax intensity. The agricultural sector reports poor performances in three environmental efficiency indicators while achieving a fair performance in the waste water efficiency and good performances in the hazardous waste efficiency and the environmental tax intensity. Its environmental tax intensity transcends the chemical industry's tax intensity due to its lower economic productivity and resulting lower GVA generation (see below). In each other environmental efficiency indicator, the chemical industry is a bad performer. The health economy, which is a cross-sectional economy of both the manufacturing and

**Figure 5.9** Environmental ratio indicators in rescaled performance scores for the selected branches in the German economy in 2016; IT, Information Technology

the service sectors, is clustered along with the service sector's selected branches. Its stakes in the manufacturing sector are not concentrated on environmentally polluting industries. For example, only 5.23% of the chemical industry is attributable to the health economy in 2016 (see Appendix A.2).

The environmental growth indicators' distributions are clustered approximately between medium and fair performance scores (see Figure 5.10). Best displayed performers are the financial industry in the reduction of air emissions (63.94 in 2016) and primary energy consumption (67.60 in 2016) as well as the car industry in the reduction of water use (65.83 in 2016) and waste water (65.06 in 2016). The IT industry scores best among the selected branches in the reduction of hazardous waste (77.05 in 2016). However, the IT industry's further outcomes are sparse. The chemical industry neither scores with environmental efficiency (see Figure 5.9) nor with environmental effectiveness: Its reduction rates are among the lowest, and only yield medium for the reduction of air emissions (44.26 in 2016) and hazardous waste (42.51 in 2016). The agricultural industry obtains consistent medium scores, with the exceptions of a fair performance in the reduction of air emissions and a bad performance in the reduction of hazardous waste. However, it achieves a good performance in the ratio indicator hazardous waste efficiency (see Figure 5.9), and it may be concluded that a lack in the reduction of hazardous waste is less harming.

Rescaled ratio indicators yrs of the social domain are rather distributed across the scale, and performances of the selected branches range from bad to good (see Figure 5.11). In contrast to the environmental domain, a segmentation of industries in

**Figure 5.10** Environmental growth indicators in rescaled performance scores for the selected branches in the German economy; IT, Information Technology

the manufacturing and the service sectors is not observed. Positively outstanding is the financial industry with regard to three tax indicators, average compensations of employees, labour share, and the share of non-marginally-employed employees, contributing to approaching decent work for all (SDG 8.5). A further leading industry is the car industry with highest results among the selected branches for the average compensations of employees, share of non-marginally-employed employees, and the quota of severelydisabled employees. Despite the high values in the average compensations of employees, the car industry only performs medium in the labour share and could distribute more income to its employees. Weaknesses of this industry are the quota of gender equality and the VAT intensity. Contributions to inclusion and equal opportunities (SDG 10.2; SDG 10.3; SDG 10.4) should be improved. The real estate industry's performances are diverse. It yields good performances in the gender equalities but bad performances in the labour share and the share of non-marginally-employed employees, harming decent work for all (SDG 8.5). The IT industry is a mid-ranging industry, which is neither among the best nor among the worst performers. The health economy operates well in the quota of gender equality of marginally-employed employees; fairly in the labour share, share of non-marginally-employed employees and the quota of severely-disabled employees; but it features medium performances in the average compensation of employees p.h., the quota of gender equality, and the share of apprentices. Its average compensation of employees p.c. is only poor. Targets on social protection are managed (SDG 1.3), but targets on decent work (SDG 8.5) are not succeeded in. The overall German economy, which is typically located between the manufacturing and the service

**Figure 5.11** Social ratio indicators in rescaled performance scores for the selected branches in the German economy in 2016; CIT, Corporate Income Tax; IT, Information Technology; p.c., per capita; p.h., per hour; VAT, Value Added Tax

sectors, experiences an exceptional peak in the quota of gender equality. The share of female socially-insured employees and the share of female labour force, an indicator that always refers to the overall German economy, are nearly equivalent. A difference of 0.0004 percentage points is reported for the unscaled quota of gender difference in 2016. This is the sample's minimum and translated into a rescaled performance score of 100.00. The agricultural industry is a poor performer and only scores fairly with the quotas of gender equalities and share of apprentices. The other industries are mid-ranging without extraordinary incidents.

Rescaled growth indicators ygs of the social domain are relatively homogeneous among the selected branches (see Figure 5.12). The IT industry scores best. An exception is the reduction of female marginally-employed employees as the financial industry takes over the first place. A further star of the financial industry is the reduction of marginallyemployed employees, which is in line with the corresponding efficiency indicator (see above). However, the financial industry exhibits poor to medium performances in the remaining social rescaled growth indicators ygs. Improvements are required to approach targets on, for example, decent work (SDG 8.5). The chemical industry stands out with good performances in the reduction of marginally-employed employees (85.55 from 2008 to 2016) and the reduction of female marginally-employed employees (84.14 from 2008 to 2016). It further operates fairly in the growth of apprentices. Its positive unscaled growth rate of 10.19% from 2008 to 2016 is transformed into a performance score of 72.13, positively contributing to Germany's shortage of skilled workers (see Section 5.3.2; e.g. Bonin, 2019) and the SDG 4.3 on vocational education. The agricultural and the

**Figure 5.12** Social and economic growth indicators in rescaled performance scores for the selected branches in the German economy; IT, Information Technology

real estate industries are not able to strike with efficiencies but achieve mid-ranging results in effectiveness.

Economic performances of the selected branches in the rescaled ratio indicators yrs are displayed in Figure 5.13. In accordance with impressions from the summary statistics (see Section 5.3.4.1), performances of the selected branches are skewed towards the interior of the radar chart. However, the real estate industry stands out in five economic rescaled ratio indicators yrs as the best performer among the selected branches. It achieves good results in the degree of modernity, investment intensity, GVA rate, and the labour productivities (p.c. and p.h.). It further stands out as the worst performer among the selected branches in six economic rescaled ratio indicators yrs: the gross capital productivity, net capital productivity, consumed capital productivity, internal R&D intensity, share of R&D employees, and the share of imported input. Economic productivity ought to be improved (SDG 8.2). The IT industry performs best in the gross and the net capital productivity with further medium to good performances in several economic rescaled ratio indicators yrs. The chemical and the car industries yield similar performances with fair results in the rescaled ratio indicators yrs on R&D and labour productivities. The agricultural sector's economic performance remains bad to medium, except for fair performances in the net import intensity. Economic productivity (SDG 8.2) is not provided, but contributions to international trade (SDG 17.11) are realised. The manufacturing sector generally overshoots the service sector, with the health and the overall German economies in its midst.

The economic domain's only growth indicator – growth of working population – is

**Figure 5.13** Economic ratio indicators in rescaled performance scores for the selected branches in the German economy in 2016; GVA, Gross Value Added; IT, Information Technology; p.c., per capita; p.h., per hour; R&D, Research and Development

reported along with the social domain (see Figure 5.12) and yields similar results to the social domain's growth indicators ygs.

After scaling, weights ω are derived, and the diverse weighting methods' results are presented in the next section, Section 5.4.

#### **5.4 Weighting**

Three methods are applied to determine the MLSDI's weights ω and importance factors ψ: the PCA (see Section 4.3.7.2), PTA (see Section 4.3.7.3), and the MRMRB algorithm (see Section 4.3.7.4). The PC family requires a priori analyses of eigenvalues and explained cumulative variances to determine the included PCs. This is accomplished in the first subsection of this section, Section 5.4.1. Section 5.4.2 outlines the MRMRB algorithm's diagnostics, and Section 5.4.3 compares and discusses the empirical findings of the three weighting methods. The PC family further demands posteriori evaluations of statistical test results, conducted in Section 5.4.4.

#### **5.4.1 The Principal Component (PC) family's eigenvalues and explained cumulative variances**

In the case of the PCA, PCs with eigenvalues larger than one are included (Kaiser's criterion), and at least 70% of the cumulative variance must be explained (see Section 4.3.7.2; e.g. Field, 2009). The modified Kaiser's criterion for the PTA requests

**Figure 5.14** Eigenvalues and explained cumulative variances of the Principal Component Analysis (PCA) and the Partial Triadic Analysis (PTA); PCs, Principal Components

to retain PCs with eigenvalues larger than the number of time periods T (see Section 4.3.7.3), which is equivalent to nine (see Section 5.1). The additional threshold on the explained cumulative variance remains unchanged.

The PC family's eigenvalues and explained cumulative variances are shown in Figure 5.14. The application of the Kaiser's criterion results in inclusion of the first 11 PCs for the PCA (see Figure 5.14a). The first PC yields an eigenvalue of 10.30, while the 11th PC's eigenvalue amounts to 1.14. The additional threshold on the explained cumulative variance is not required as 83.95% of the sample's variance is explained by including the first 11 PCs: The additional threshold's dashed line crosses the solid line of the Kaiser's criterion on the right and below the circled curve of the

PCs (see Figure 5.14b). Compared to the PCA, the PTA's eigenvalues and number of included PCs are higher because time periods t are implicit variables. The first PC reaches a score of 127.97, and the last PC to involve in the further analysis is the 13th PC (see Figure 5.14c) with an eigenvalue score of 10.22. The resulting explained cumulative variance amounts to 88.39%, and the additional threshold is not required (see Figure 5.14d).

Diagnostics of the MRMRB algorithm follow in the next section, Section 5.4.2.

### **5.4.2 The Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm's discretisation and backward elimination**

The discretisation method applied in the MRMRB algorithm is equal frequency discretisation (see Section 4.3.7.4; Yang & Webb, 2009). The bin size χ<sup>s</sup> and the number of bins χ<sup>n</sup> equal 7.87. The backward elimination process of the MRMRB algorithm starts with rescaled key indicators y<sup>s</sup> that contain the lowest mutual information. The rescaled key indicators' ranking, with an ascending mutual information, can be found in Table 5.14. The quota of gender equality features the lowest mutual information and is hence eliminated first. The last eliminated rescaled key indicator y<sup>s</sup> is the energy efficiency. The backward elimination ranking diverges from the reverse ranking of importance factors (see Figure 5.15) because it refers to the integrated assessment before coefficients are adjusted to sum up to one in each contentual domain (see Section 4.3.7). The mutual information matrix is not attached, given its size of Y xY , which is equivalent to 44x44.

In the following section, Section 5.4.3, the PC family's weights ωP C resulting from the first 11 and 13 included PCs are analysed and compared to weights derived by the MRMRB algorithm ωMRMRB.

#### **5.4.3 Comparative analysis of weights**

Before analysing and comparing weights derived by the PC family ωP C and MRMRB algorithm ωMRMRB, the PTA's results of temporal assessment are examined. The PTA's weights of time periods ΩPTA range from 11.03% in 2008 to 11.16% in 2012, 2013, and 2014. These weights ΩPTA nearly correspond to equal weights (11.11%). In conclusion, the PTA provides evidence that the temporal dimension is irrelevant, and structures remain constant over time periods t. This finding approves equal temporal weighting of the PCA and the MRMRB algorithm.

Weights to be applied on the rescaled key indicators y<sup>s</sup> derived by the PC family ωP C and the MRMRB algorithm ωMRMRB are contrasted in Table 5.15 to Table 5.17. Weights derived by the PC family ωP C are generally similar to each other. Moreover, the


**Table 5.14** Rescaled key indicators' ranking according to the backward elimination of the Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm; CIT, Corporate Income Tax; GVA, Gross Value Added; p.c., per capita; p.h., per hour; R&D, Research and Development; VAT, Value Added Tax

PC family's weights ωP C remain close to equal weights. Equal weights would correspond to values of 9.09% in the environmental domain, 5.00% in the social domain, and 7.69% in the economic domain. Weights derived by the MRMRB algorithm ωMRMRB feature higher variations.


**Table 5.15** Environmental key indicators' weights derived by the Principal Component Analysis (PCA), Partial Triadic Analysis (PTA), and the Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm

Weights ω applied on the rescaled environmental key indicators y<sup>s</sup> are shown in Table 5.15. Environmental rescaled ratio indicators yrs are generally weighted more heavily than their corresponding rescaled growth indicators ygs, with exceptions in the case of the PTA. Despite the exceptions, it may be concluded that focus should be directed towards environmental efficiency. Highest weight ω in the environmental domain receives the topic climate change, with its rescaled key indicators y<sup>s</sup> on air emissions and energy consumption. The climate change topic is also emphasised in the GRI and the SDG disclosures (see Section 5.3.1.1). In the case of the MRMRB algorithm, energy efficiency obtains the highest weight ωMRMRB, with a value equivalent to 13.07%, exceeding the weight ωMRMRB of the closely related air emissions efficiency (10.36%). From a natural science perspective, rescaled key indicators y<sup>s</sup> on air emissions are contentually richer. However, from an anthropocentric point of view, sources of air emissions – among others primary energy consumption – ought to be managed (see Section 5.3.1.1). Thus, the MRMRB algorithm upgrades the energy efficiency and assigns the highest weight ωMRMRB to this rescaled key indicator ys. In contrast, the PC family does not distinguish between energy efficiency and air emissions efficiency but assigns similar weights ωP C to both (e.g. PTA: 9.64% and 9.65%, respectively). Rescaled growth indicators ygs on energy and air emissions are ascribed slightly lower weights ω, with higher variances in the case of the MRMRB algorithm. The second

most important environmental topic identified by the MRMRB algorithm is efficiency of water use and waste water. The rescaled ratio indicators yrs on water obtain similar weights by the MRMRB algorithm ωMRMRB (11.14% and 11.15%, respectively), but the PC family allocates a higher weight ωP C to the water efficiency compared to the waste water efficiency (e.g. PCA: 9.17% vs. 8.88%, respectively). These weights ωP C are in line with the MRMRB algorithm's result on rescaled ratio indicators yrs on climate change: Rescaled key indicators y<sup>s</sup> that point towards the source of pollution receive a higher weight ω. Relatively low weights ω are allocated to the hazardous waste efficiency (e.g. MRMRB algorithm: 9.04%), despite the fact that it achieves best central results among the rescaled key indicators y<sup>s</sup> of the environmental domain (see Section 5.3.4.1). This finding demonstrates that weights' magnitudes do not depend on the empirical results of the rescaled key indicators y<sup>s</sup> but their interconnectedness, reflecting synergies and trade-offs as desired (see Section 3.1; e.g. Costanza, Fioramonti & Kubiszewski, 2016).

Regarding the social domain, rescaled growth indicators ygs receive higher weights ω than their rescaled ratio indicators' counterparts (see Table 5.16). Most important in the social domain across the three weighting methods are the growth of socially-insured employees and the growth of employees (e.g. MRMRB algorithm: 7.85% and 7.38%, respectively). This finding is reasonable in two aspects: First, employment possesses a dual purpose (source of income and key to transition; see Section 5.2.1; Harangozo et al., 2018), and second, the key figure socially-insured employees is contentually richer than the key figure employees because employees include decent as well as precarious employment (see Section 5.2.1). A further interesting finding rests in the weighting of the rescaled key indicators y<sup>s</sup> on compensations of employees. The average compensation of employees p.h. receives a higher weight by the MRMRB algorithm ωMRMRB (6.89%) than the average compensation of employees p.c. (5.36%). This is reasonable because the latter rescaled key indicator y<sup>s</sup> is less precise, given its standardising key figure's mixture of full-time and part-time employees (see Section 5.2.1). Moreover, the labour share receives the lowest weight ωMRMRB (4.02%) among the rescaled key indicators y<sup>s</sup> on compensations of employees. From an employee's perspective this finding is reasonable: Not the proportion of the GVA distributed is of interest but the monetary value received in relation to the work done. The PTA follows the MRMRB algorithm's relation, but the magnitude is nearly insignificant. In opposition, the PCA does not pursue this relation but weights the labour share more heavily than the average compensation of employees p.c. A further reasonable finding is the MRMRB algorithm's higher (though, nearly insignificant) weight ωMRMRB of the quota of gender equality of marginallyemployed employees and the quota of gender equality (3.51% and 3.17%, respectively). At least two SDG targets are addressed with the first mentioned rescaled key indicator y<sup>s</sup> – SDG 1.3 (social protection) and SDG 5.1 (end discrimination against women and girls) – while the latter rescaled key indicator y<sup>s</sup> only addresses the SDG 1.3. The PCA


**Table 5.16** Social key indicators' weights derived by the Principal Component Analysis (PCA), Partial Triadic Analysis (PTA), and the Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm; CIT, Corporate Income Tax; p.c., per capita; p.h., per hour; VAT, Value Added Tax

reverses this relation and assigns a higher weight ωPCA to the quota of gender equality. With regard to apprentices, only the PTA reflects the problematic shortage of skilled labour (see Section 5.3.2; e.g. Bonin, 2019) and allocates a relatively high weight ωPTA of 5.16% to the share of apprentices.

Table 5.17 displays weights ω of the economic rescaled key indicators ys. Among


**Table 5.17** Economic key indicators' weights derived by the Principal Component Analysis (PCA), Partial Triadic Analysis (PTA), and the Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm; GVA, Gross Value Added; p.c., per capita; p.h., per hour; R&D, Research and Development

the rescaled ratio indicators yrs on capital, the gross capital productivity receives the highest weight ω by all three weighting methods. This finding may be justified by the fact that the gross capital productivity contains most information: It includes the current value of assets as well as the depreciated value in relation to the generated GVA (see Section 5.2.1; Section 5.3.1.3). The degree of modernity receives the lowest weight ω among the capital indicators because it disregards the GVA, which is essential in assessing economic productivity enhancements as of SDG 8.2. The GVA rate receives a relatively low weight by the MRMRB algorithm ωMRMRB (4.92%). It does not indicate productivity but merely value generation in proportion of the output (see Section 5.3.1.3). The PC family does not recognise the GVA rate's low explanatory power regarding productivity and assigns weights ωP C of 6.50%. Similar to the average compensations of employees, the labour productivities are weighted in an economically reasonable way by the MRMRB algorithm: The rescaled key indicator p.h. receives a higher weight ωMRMRB than its p.c. counterpart (7.29% vs. 6.46%, respectively). In contrast, the PC family neglects this aspect and valorises the rescaled key indicator p.c. Last, the MRMRB algorithm weights the net import intensity more heavily than the share of imported input (8.42% vs. 5.78%, respectively). The net import intensity includes both

imports of input and imports for final consumption and is thus informationally richer. The PTA follows this relation (however with a lower spread), but the PCA does not.

Weights ω indicate importances within a contentual domain but not towards the overall MLSDI c1. The key indicators' importance factors ψ towards the overall MLSDI c<sup>1</sup> are computed by adjusting weights ω with the rule of three (see Section 4.3.7). Figure 5.15 portrays the importance factors ψ in a decreasing order according to the MRMRB algorithm. Equal importance factors would correspond to values of 2.27%. In view of the MRMRB algorithm, most important towards the overall MLSDI c<sup>1</sup> are the growth of socially-insured employees, growth of employees, and the energy efficiency. Least important are the quota of gender equality, GVA rate, and the share of apprentices. Ordering of the importance factors derived by the PC family ψP C differs from the MRMRB algorithm's ordering: Highest importance factors ψP C are assigned to the gross capital productivity, growth of socially-insured employees, and the net capital productivity. Because employment serves a dual mission (see Section 5.2.1; Harangozo et al., 2018) and climate change is the main topic of the environmental domain (see Section 5.3.1.1), the MRMRB algorithm's ordering of importance factors ψMRMRB is more plausible.

To sum up, the PC family does not clearly differentiate between diverse rescaled key indicators ys, but weights ωP C are sticky around equal weights. The main aspects of sustainable development are not captured. In contrast, the MRMRB algorithm assigns higher weights ωMRMRB to informationally richer rescaled key indicators y<sup>s</sup> by detecting higher order correlations. As a result, importance factors ψMRMRB towards the overall MLSDI c<sup>1</sup> correctly reflect most important topics of sustainable development. In conclusion, the MRMRB algorithm outperforms the PC family, and the theoretical superiority of the MRMRB algorithm (see Section 4.3.7.4) is supported by empirical evidence. The MRMRB algorithm is the preferred and applied weighting method in the further analysis. For the German sample, the MRMRB algorithm asserts to focus efficiency in the environmental domain and effectiveness in the social domain.

Before applying weights derived by the MRMRB algorithm ωMRMRB on the rescaled key indicators ys, statistical tests of the PC family are examined in the following section, Section 5.4.4.

#### **5.4.4 Statistical tests of the Principal Component (PC) family**

Statistical tests of the PC family are conducted and analysed to verify the statistical validity of the PC family's results. Performed statistical tests include the KMO test for sampling adequacy and the Bartlett's test of sphericity (see Section 4.3.7.5; e.g. Bartlett, 1950; Kaiser, 1970). To evaluate whether the tests should be based on Pearson's correlation coefficient for normal data or Kendall's tau for non-normal data, normality of z-score scaled key indicators y<sup>z</sup> are tested in the fashion of the key figures' normality tests

**Figure 5.15** Importance factors of the Principal Component Analysis (PCA), Partial Triadic Analysis (PTA), and the Maximum Relevance Minimum Redundancy Backward (MRMRB) algorithm; CIT, Corporate Income Tax; GVA, Gross Value Added; p.c., per capita; p.h., per hour; R&D, Research and Development; VAT, Value Added Tax

(see Section 4.3.3.4 and Section 5.2.2). The Shapiro-Wilk and the Kolmogorov-Smirnov tests both conclude that 20 z-score scaled key indicators y<sup>z</sup> are normally distributed and 14 z-score scaled key indicators y<sup>z</sup> are non-normal. Ambiguous results are obtained for the remaining ten z-score scaled key indicators yz, with the following pattern: Data are non-normal under the Shapiro-Wilk test but normal under the Kolmogorov-Smirnov test. Therefore, histograms are consulted, but a clear decision cannot be made. The test statistics and p-values are disclosed in Table A.12 to Table A.14, and two example histograms of z-score scaled key indicators y<sup>z</sup> with ambiguous test results are provided in Figure A.1 in the Appendix A.6. The average compensation of employees p.c. and the consumed capital productivity experience the weakest and the strongest rejections of the null hypotheses by the Shapiro-Wilk tests, respectively, with p-values of 0.04 and 0.0000, respectively. According to the multivariate Shapiro-Wilk test, the data are multivariate non-normal, with a test statistic of 0.7483 and a p-value less or equal than 0.0001 (rejection of the null hypothesis). Given the ambiguities, the non-parametric Kendall's tau is preferred over the parametric Pearson's coefficient for the KMO test of sampling adequacy.

The KMO measure reveals the meritorious sampling adequacy of both the PCA and the PTA with values amounting to 0.8370 (average from 2008 to 2016) and 0.8391, respectively. The null hypotheses of the Bartlett's tests are rejected in both cases with p-values less or equal than 0.0001. The data are suitable for applying the PC family. In conclusion, results of the PC family as of Section 5.4.3 remain valid.

The following section, Section 5.5, analyses the resulting subindices d and the overall MLSDI c<sup>1</sup> based on the MRMRB algorithm's weights ωMRMRB.

### **5.5 Empirical findings of the four composite sustainable development measures**

The rescaled key indicators y<sup>s</sup> are weighted and geometrically aggregated to obtain the subindices of each contentual domain d (see Section 4.3.8). The subindices d are then aggregated into the overall MLSDI c<sup>1</sup> via the geometric mean. Summary statistics of the four composite measures are analysed in Section 5.5.1, and results of the selected branches are evaluated in Section 5.5.2.

#### **5.5.1 Summary statistics**

The summary statistics of the subindices d mirror the impressions gained in the detailed descriptions and analyses of the rescaled key indicators y<sup>s</sup> (see Section 5.3.4.1). Highest scores in terms of the mean, median, maximum, and the 75th percentile are reached by the environmental subindex (see Table 5.18). Its lead is followed by the social domain, whereas the economic domain scores lowest.

The environmental subindex yields medium to fair central performances. Progress over time is insignificant, but the distributional shape is in favour of environmental


5.5. Empirical findings of the four composite sustainable development measures 169

**Table 5.18** Summary statistics of the subindices and the overall Multilevel Sustainable Development Index (MLSDI) in the German economy from 2008 to 2016; Max, Maximum; Min, Minimum; Q1, 25th percentile; Q3, 75th percentile

protection: The medians exceed the means, the 25th percentiles are located above a score of 25.00, resulting in moderate negative skewnesses. Bottom performers should be focused to enhance their performances, lifting the central measures to be at least fair.

Compared to the environmental subindex, the social subindex's central performances are weaker. A higher effort is required to yield fair performances. The social subindex's minima are the highest among the four composite measures' minima. However, the 75th percentiles do not reach the fair bracket the normal score of 75.00 is located in. Not the bottom but the centre of the distribution should be focused to improve social development.

Among the three subindices d, the economic subindex performs worst. Its central scores are rated as poor performances, and enhancements over time of the central measures, maxima, and the percentiles are insignificant. Additionally, minima deteriorate in the course of time. The 25th percentiles just surpass the normal score of 25.00, and the 75th percentiles just reach the bracket of medium performances, remaining far from the normal fair performances at scores of 75.00. Moderate positive skewnesses result, which are undesirable distributional properties for economic prosperity. Major improvements are required across the whole distribution.

The overall MLSDI's distributional properties result from the subindices' properties. Central measures are located between the medium to fair performances of the environmental subindex and the poor performances of the economic subindex. However, the effect of the geometric aggregation comes to light. The overall MLSDI c<sup>1</sup> is inclined towards the poor economic performances: Its central measures only yield medium performances at the lower end of the bracket, and the 75th percentiles do not yield the normal 75.00.

The sample's results of the four composite measures are illustrated in Figure 5.16 and Figure 5.17. Figure 5.16 contains the four composite measures' performance scores of the 62 economic objects n in the German economy from 2008 to 2016. The environmental subindex features the highest spread, with relatively few economic objects n at the bottom and relatively many economic objects n at the top of the distribution. Compared, the social subindex's spread is smaller, and especially the bottom of the distribution is enhanced. The economic subindex features relatively many outcomes at the bottom, and the overall MLSDI c<sup>1</sup> overlaps the subindices d. Progress over time has been made but should be enhanced for higher significance.

Figure 5.17 plots the four composite measures' frequency distributions and densities, strengthening the empirical findings of the previous analysis. The environmental domain exhibits economic objects n with bad performances. These should be focused for improvements. The social domain is not but should be represented at the top of the distribution. Last, economic performances should be enhanced in their entirety.

**Figure 5.16** The four composite measures in rescaled performance scores in the German economy from 2008 to 2016; MLSDI, Multilevel Sustainable Development Index

**Figure 5.17** Frequency distribution and density of the four composite measures in the German economy in 2016; MLSDI, Multilevel Sustainable Development Index

#### **5.5.2 Comparative analysis of the selected branches**

The environmental subindices for the selected branches are displayed in Figure 5.18. Results are relatively stable over time except for volatilities in the agricultural sector and the car industry at the beginning of the time horizon. Given the financial industry's fair performances in the environmental ratio indicators y<sup>r</sup> and the environmental growth indicators y<sup>g</sup> (see Section 5.3.4.2), its environmental subindex ranks first. The car industry belongs to the top performers owing to its fair environmental effectiveness. The

**Figure 5.18** Environmental subindex in rescaled performance scores for the selected branches in the German economy from 2008 to 2016; IT, Information Technology

health and the overall German economies are located between the manufacturing and the service sectors. The IT industry features good environmental efficiency performances but is downgraded, given its sparse performances in environmental effectiveness. In contrast to the chemical industry, the agricultural sector offsets its bad performances in the air, energy, and the water efficiency by fair performances in the further rescaled environmental ratio indicators yrs and environmental effectiveness. Resulting is an environmental subindex around 40.00 (medium). At the bottom of the distribution, the following branches should be focused for improvements in environmental protection along with the chemical industry: 19 Manufacture of coke and refined petroleum products; 23 Manufacture of other non-metallic mineral products; 24 Manufacture of basic metals; D Electricity, gas, steam, and air conditioning supply; and 17 Manufacture of paper and paper products (see Section 5.5.1 and Table A.1).

The social subindices of the selected branches feature slight increasing trends (see Figure 5.19). The financial and the car industries feature unbalanced performances (bad to poor and fair to good performances) in the rescaled social key indicators y<sup>s</sup> (see Section 5.3.4.2). Their social subindices are downgraded because the weighted product punishes bad performances. These cannot be offset easily, and balanced performances yield better aggregated scores. The IT industry is the leader among the selected branches with respect to the social subindex, given its balanced medium to fair performances. The chemical industry and the aggregated branches are mid-ranging. The real estate industry also suffers from the geometric aggregation: Its several bad to medium performances annihilate its other fair to good performances in the social

**Figure 5.19** Social subindex in rescaled performance scores for the selected branches in the German economy from 2008 to 2016; IT, Information Technology

domain. The agricultural sector performs worst.

The economic subindex is slightly volatile, and increasing trends are visible for some industries towards the end of the time horizon (see Figure 5.20). Similar to the social subindex, the IT industry ranks first owing to its regular fair to good performances. Midranging are the car and the chemical industries, which feature several fair and several poor performances. As the real estate industry is heavily unbalanced and stands out in both good and bad performances (see Section 5.3.4.2), its geometrically aggregated score is relatively low, just entering the bracket of medium. The financial industry yields a slightly better economic subindex with stable poor to medium performances. Once more, the agricultural sector is the worst performer, with poor performance scores. This sector is important for sustainable development and therefore explicitly addressed in the SDGs. For instance, targets on agricultural productivity are established (SDG 2.3; SDG 2.4). The data analysis of this work highlights that the agricultural sector requires assistance in contributing to sustainable development.

Last, Figure 5.21 portrays the overall MLSDI c<sup>1</sup> for the selected branches. Due to constant medium to good performances in the subindices d, the IT industry comes first with regard to overall sustainable development. The second rank is taken by the car industry. The car and the chemical industries perform similarly in the social and the economic domains. However, the environmental domain sorts the wheat from the chaff: The chemical industry does not recover from its poor environmental performances because the geometric mean exacerbates substitutability of the domains. The criterion to implement weak sustainability with minimised substitutability (see Table 4.1) is

**Figure 5.20** Economic subindex in rescaled performance scores for the selected branches in the German economy from 2008 to 2016; IT, Information Technology

**Figure 5.21** Overall Multilevel Sustainable Development Index (MLSDI) in rescaled performance scores for the selected branches in the German economy from 2008 to 2016; IT, Information Technology

realised and comes into effect in aggregating the rescaled key indicators y<sup>s</sup> into the subindices d (see above) and in aggregating the subindices d into the overall MLSDI c1.

The next section, Section 5.6, analyses the MLSDI's sensitivities.


**Table 5.19** Average rank shifts of economic objects by the four composite measures and the three outlier and weighting methods in 2016; α, outlier coefficient; MLSDI, Multilevel Sustainable Development Index; MRMRB, Maximum Relevance Minimum Redundancy Backward algorithm; PCA, Principal Component Analysis; PTA, Partial Triadic Analysis

#### **5.6 Sensitivity analyses**

Sensitivity analyses should be carried out for calculation steps with alternative approaches (see Section 4.3.9). These include missing value imputation (see Section 4.3.3 and Section 5.2.2), outlier detection (see Section 4.3.5 and Section 5.3.3), and weighting (see Section 4.3.7 and Section 5.4). However, because Amelia II yields implausible results (see Section 5.2.2), options for missing value imputation vanish. Hence, only sensitivities of outlier detection and weighting are tested and analysed.

Average shifts in economic objects' ranks by the four composite measures and the three outlier detection methods are displayed in the first three columns of Table 5.19. Full disclosure of the economic objects' ranks by outlier coefficient α can be found in Table A.15 in the Appendix A.7. As a result of a change of the outlier coefficient α from 1.5 to 3.0, economic objects n alter their ordinal rank position in the environmental subindex on average by 2.94. With regard to the social and the economic subindices, average rank shifts are slightly lower with values approximately equal to 2.50. Lower outlier rates β in these two domains are responsible for this result. The average rank shifts of the social and the economic subindices are approximately equal despite the fact that the economic domain's outlier rate β exceeds the social domain's outlier rate β (8.66% vs. 3.09%; see Section 5.3.3). This finding is explained by the differences in the degree of outlyingness: The social domain involves strong outlying key indicators y<sup>o</sup> (e.g. key indicators y on taxes; see Section 5.3.3) that are treated in both outlier treatment cases (α = 1.5 and α = 3.0), whereas the economic domain features mixed outlying key indicators y<sup>o</sup> (weak to strong; see Section 5.3.3) that are only treated partially in the laxer case. The highest average rank shift is reported for the overall MLSDI c<sup>1</sup> (3.09) because average rank shifts of the contentual domains enforce each other. When

**Figure 5.22** The four composite measures by the three outlier detection methods in rescaled performance scores in the German economy in 2016; MLSDI, Multilevel Sustainable Development Index

comparing both outlier detection cases to the non-treatment case, average rank shifts increase. The increases in rank shifts are in line with the outlier rates β because in the non-treatment case, outlying key indicators y<sup>o</sup> are not treated at all irrespective of the degree of outlyingness. The maximum average rank shift is reported by the environmental subindex and detection at the inner fence vs. the non-treatment case. Generally, average rank shifts of this case exceed average rank shifts of the detection at the outer fence vs. the non-treatment case because detection at the outer fence is laxer, such that fewer key indicators y are classified as outlying key indicators yo.

Figure 5.22 displays the sample's four composite measures by the three outlier detection methods. First, distributional differences are remarkable for the subindices d that feature relatively high outlier rates β. These are the environmental and economic subindices. In the non-treatment case, the economic objects n are closely clustered, and the distributions feature low spreads. In the environmental domain, the distribution is clustered at the top because strong outlying key indicators y<sup>o</sup> exist at the bottom (see e.g. Figure 5.7b<sup>52</sup>). As a result of removing these, scales of the key indicators y are shortened, such that more economic objects n feature bad or poor performances in the key indicators y. As a result, these economic objects' environmental subindices are downgraded towards the lower end of the distribution. In the economic domain, the opposite occurs: In the wake of outlier treatment, the distribution spreads towards the top because outlying

<sup>52</sup>Because of the air emissions intensity's negative effective direction ξ−, the portrayed outlying key indicators <sup>y</sup>o at the top constitute outlying key indicators <sup>y</sup>o at the bottom in view of the composite measures.

**Figure 5.23** The four composite measures by the three weighting methods in rescaled performance scores in the German economy in 2016; MLSDI, Multilevel Sustainable Development Index; MRMRB, Maximum Relevance Minimum Redundancy Backward algorithm; PCA, Principal Component Analysis; PTA, Partial Triadic Analysis

key indicators y<sup>o</sup> rather exist at the top (see e.g. Figure 5.8b). Second, variations in the outlier coefficient α only result in significant changes in the economic domain. This domain is the only domain with numerous weak to moderate outlying key indicators yo. These are not detected in the laxer detection case. Sensitivities of the outlier detection method of the sample's full frequency distributions can be found in Figure A.2 in the Appendix A.7.

Average rank shifts as a result of a change in the weighting method range from 0.3824 to 2.09 (see Table 5.19). The four composite measures are relatively robust against changes in the weighting method. Average rank shifts of the PC family remain below 1.00: On average, economic objects n change their ranks of the four composite measures below one position. This finding is in line with the PC family's similar weights ωP C (see Section 5.4.3). Changing the weighting method from the PC family to the MRMRB algorithm yields slightly higher average rank shifts. A complete report of the economic objects' ranks by the four composite measures and the three weighting methods is provided in Table A.16 in the Appendix A.7.

Figure 5.23 illustrates the sample's distributional changes as a result of the different weighting methods, endorsing the average rank shifts' finding: The four composite measures' distributions are relatively stable and robust to alterations in the weighting method. Full frequency distributions of the sample by the four composite measures and the three weighting methods can be found in Figure A.3 in the Appendix A.7.

In conclusion, economic objects' rankings and performance scores are sensitive to outlier detection but not to weighting. Outlier treatment distorts the true picture (see Section 4.3.5.1; McGregor & Pouw, 2017) but is desired to remove statistical biases. Outlier treatment should be accomplished in order to shorten scales and dissolve the closely clustered economic objects n. Differentiation between economic objects n is enabled, which is required to direct actions for improvement in sustainable development. Especially the environmental domain profits from distortion of the true picture because observations are lowered towards the bottom. Economic objects n at the bottom should be focused for improved environmental protection. The strictness of the outlier detection method only has an impact if weak to moderate outlying key indicators y<sup>o</sup> are present. This is especially the case in the economic domain. To also reduce statistical bias in this domain, the stricter base case (α = 1.5) is preferred. Furthermore, the superior MRMRB algorithm remains to be recommended for weighting.

#### **5.7 Summary**

In this chapter, the novel methodology of the MLSDI has been deployed to the sample region Germany. 62 branches of the German economy as well as five aggregated branches, including the cross-sectional health economy, constitute the objects of investigation. The time horizon reaches from 2008 to 2016. Sustainable development key figures are collected from statistical authorities, and missing values are imputed by single time series imputation. The sophisticated multiple panel data imputation algorithm Amelia II fails because the normality assumption is violated. Missing values are filled by single time series imputation. 44 sustainable development key indicators are derived by aligning the meso GRI and the macro SDG frameworks, establishing multilevel comparability of the MLSDI and finally addressing the perspective gap empirically. Outliers are treated by the IQR method and are especially strong in the environmental domain. Weights are derived by the PCA, PTA, and the MRMRB algorithm. The theoretical advantage of the MRMRB algorithm to capture higher order correlations is confirmed by the empirical findings: The MRMRB algorithm weights informationally richer indicators more heavily, while the PC family does not establish this clear pattern. Environmental efficiency indicators on climate change and social effectiveness indicators on employment receive highest weights and should be focused for improvements in sustainable development performances. The application of the geometric aggregation achieves the desired effect of weak sustainability with minimised substitutability: Bad performances are punished and cannot be easily compensated. In conclusion, industries with unbalanced performances lag industries with rather balanced results. The comparative analysis of the selected branches demonstrates their contributions to sustainable development. The IT industry contributes most, while improvements in the chemical industry's environmental performance and the

agricultural industry's performance with respect to all domains are required. The agricultural industry's importance for sustainable development is highlighted in the SDGs and thus, actions and aid are urgently needed. Generally, the environmental domain yields the highest central outcomes, while the economic domain yields the lowest results. The environmental domain requires improvements in its bottom performers, whereas the economic domain demands enhancements across the whole distribution. The sensitivity analyses on outlier detection and weighting confirm the previously derived results.

Open Access This chapter is licensed under the terms of the Creative Commons At tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Chapter 6**

### **Discussion and conclusion**

This chapter discusses and reflects on the accomplished theoretical (see Chapter 2 and Chapter 3), methodological (see Chapter 4), and the empirical research (see Chapter 5). The present work is part of Phase C of the transdisciplinary research agenda in sustainability science (see Section 2.3.4; e.g. Lang et al., 2012). It draws on previous studies and problem framings from research and practice (Phase A), makes use of prior disclosures from the scientific and the practitioner community (Phase B), and finally provides new results that are relevant for both research and practice (Phase C). Implications of the results for research, which bear on the descriptive-analytical mode of sustainable development, are discussed in Section 6.1. Section 6.2 provides implications for practice, which relate to the transformational mode outside the science community (see Section 2.1; Wiek et al., 2012). Section 6.3 discusses limitations of the present study, unfolding opportunities for future research. This dissertation ends with an overall summary and conclusion in Section 6.4.

#### **6.1 Implications for research**

This work contributes to the debate on measurement and assessment of sustainable development performances. In particular, it contributes a novel sustainable development indicator set that includes a composite measure. Five related research gaps have been identified: the perspective, operational-to-normative, knowledge, and the sustainability gaps as well as methodological deficiencies of existing sustainable development indices. On the one hand, sustainable development demands multiple perspectives (see Section 2.3.1; e.g. Lock & Seele, 2017) because the macro SDGs can only be achieved if micro and meso objects contribute (see Section 2.3.1 and Section 2.3.2; e.g. Dahl, 2012; Griggs et al., 2014; T. Hahn et al., 2015). However, multiple perspectives are frequently disregarded outside the sustainability transitions literature, constituting the perspective gap. This work is the first to include the multilevel perspective in a conceptual framework of sustainable development (see Section 2.3.1; Rotmans et al.,

© The Author(s) 2021 C. Lemke, *Accounting and Statistical Analyses for Sustainable Development*, Sustainable Management, Wertschöpfung und Effizienz, https://doi.org/10.1007/978-3-658-33246-4\_6

2001) and thereby updates existing frameworks (see Chapter 2; e.g. Chofreh & Goni, 2017). The perspective gap has been closed theoretically, and further contributions result. First, this work is the first to review sustainable development assessment methods by a method's level of applicability (i.e. by the aggregational size of an object of investigation; see Figure 3.1). This organisation is advantageous in further aspects that are outlined in Section 6.2. Second, resulting from this review and based on the sustainable development assessment principles, this work is the first to identify the most suitable multilevel assessment method for comprehensive sustainable development measurement. Indicator sets that include a composite measure have been revealed as such a method. Third, this work contributes an advanced multilevel indicator set that includes a composite measure and can be applied to meso and macro objects for comparative analyses and benchmarking. The intersection of the meso GRI and the macro SDG frameworks at target level as outlined in GRI and UNGC (2018a) has been refined to indicator level and adjusted to current data availabilities for the German economy by official statistics. On the other hand, decisions for sustainable development should be made at operational, strategic, and normative tiers (see Section 2.3.2; e.g. Ulrich, 2001). An operational-to-normative gap is present because decision makers mostly address the operational tier only (see Section 2.3.2; e.g. Baumgartner & Rauter, 2017). Including the St. Gallen management model in the conceptual framework also points towards indicator sets that include a composite measure as the most successful tool in comprehensive multilevel measurement of sustainable development performances: Indicators and indices address the operational and the strategic tiers (see Section 3.2; e.g. Baumgartner, 2014) while being inherently normative (see Section 3.2; e.g. Waas et al., 2014).

The third identified research gap is the knowledge gap (see Section 2.3.3; e.g. Weitz et al., 2018). By tackling this gap, this work contributes insights about the interconnections of individual sustainable development elements. In doing so, this work is the first to apply an entropy-based information-theoretic algorithm to compute a sustainable development index. Indices in the field of environmental sustainable development that apply methods of information theory include, e.g. Fath and Cabezas (2004); P. E. Meyer, Kontos, Lafitte and Bontempi (2007); and Pawlowski, Fath, Mayer and Cabezas (2005). These are based on the parametric Fisher information, but the non-parametric entropy should be preferred (see Section 4.3.7.4). Entropy-based index approaches include, e.g. Rajsekhar, Singh and Mishra (2015); Ulanowicz, Goerner, Lietaer and Gomez (2009); and Y. Zhang, Yang and Li (2006). Furthermore, Nie, Lv and Gao (2017) apply information-theoretic entropy and the multilevel perspective on technological change (see Section 2.3.1; Geels, 2002) to develop an index for power system transitions. An example of an entropy-based application in a broader context of sustainable development includes Wang et al. (2015), who assess sustainable development capacities with an entropy-based weighting coefficient. However, to the best of the author's knowledge,

aggregated sustainable development performances have not been estimated by means of information-theoretic entropy. The application of an information-theoretic algorithm to tackle synergies and trade-offs of individual sustainable development elements constitutes the major methodological contribution of this work. Moreover, this study is the first to compare two multivariate statistical techniques – the PCA and the PTA – to an information-theoretic approach. It is also the first to estimate the three weighting methods' sensitivities on four composite measures of sustainable development.

The fourth identified research gap – the sustainability gap – regards the bottleneck of the science-practice linkage (see Section 2.3.4; e.g. Hall et al., 2017). The present work contributes to this bottleneck by providing detailed information about its methodological approach and data sources, such that the MLSDI can be re-built by interested change agents. Furthermore, this work is the first to publish data on 44 sustainable development key indicators, three subindices, and an overall sustainable development index for 62 twodigit industries as well as five aggregated branches, including the cross-sectional health economy, in the German economy from 2008 to 2016. Providing detailed information about the methodological approach, data sources, and objective, macro-economic benchmarks entails two advantages: First, it enhances decision usefulness across the decisional tiers by identifying and improving relevant sustainable development key indicators; and second, it encourages corporations and further objects of investigation to compare their performances to the provided macro-economic benchmarks, preventing greenwashing.

Fifth and last, previous sustainable development indices do not only lack compliance with the conceptual framework (see Section 3.3 and above), but especially the assessment principle methodological soundness is violated (see Section 4.2). Insufficient data cleaning, weighting, aggregation, and a lack of sensitivity analyses are frequent shortcomings. This work has overcome these deficits and contributes a methodologically sound sustainable development index: The MLSDI imputes missing values and treats outliers, establishing credibility, validity, and reliability of measurement; it applies a sophisticated information-theoretic algorithm to objectively determine relevances and interconnections of individual sustainable development elements; it obeys mathematical aggregation rules for credibility, validity, and reliability; and it conducts sensitivity analyses, proving the measurement's robustness and confirming its previously claimed credibility, validity, and reliability.

Compared to the reviewed sustainable development indices, the MLSDI is the only index that can be deployed at multiple levels (see Table 4.5). Hence, it features a wider scope than the previous indices. Because the reviewed indices are distinct in their indicator bases and regional scopes, data results are not comparable, and the MLSDI is only related to the previous indices in respect of its methodology. The MLSDI may serve management decisions, national industry policy, and international affairs, whereas single level indices only address one level of decision making. For example, the DJSI support corporate decision making, and the SSI assists international policy making by comparing country performances. In comparison with indices of single domains (e.g. the EPI; Esty & Emerson, 2018), the MLSDI supports decision making with regard to all three contentual domains of sustainable development. The MLSDI is based on 44 key indicators and exceeds the number of indicators of five of the nine reviewed indices. Previous indices with a narrower indicator base include the ICSD (Krajnc & Glaviˇc, 2005), FEEM SI (e.g. Pinar et al., 2014), HSDI (e.g. Bravo, 2018), SDI (Bolc´arov´a & Koloˇsta, 2015), and the SSI (e.g. van de Kerk et al., 2014). Their number of indicators range from four to 38 (HSDI vs. ICSD, respectively). In conclusion, the MLSDI assists a broader range of essential topics in sustainable development performance measurement. Moreover, decision making based on the MLSDI will be more accurate in general because of its overall methodological soundness. Only one of the nine reviewed indices – the MISD (e.g. Shaker, 2018) – eliminates statistical biases by sound missing value imputation. Statistical biases that originate in outlying observations remain for all nine previous indices. With regard to scaling, three of the nine reviewed indices – the SDGI (e.g. Schmidt-Traub et al., 2017a), SSI, and the WI (Prescott-Allen, 2001) – apply a scaling method that correctly interplays with the deployed aggregation method. However, of these three indices, the SSI is the only index that implements geometric aggregation, which is essential to map the desired weak sustainability with minimised substitutability (see Section 2.2.4 and Table 4.1). Only one of the reviewed indices – the SDI – deploys the required bottom-up statistical weighting. The SDI determines weights by a PCA, a powerful tool that is used in further sustainable development indices (e.g. Barrios & Komoto, 2006; T. Li, Zhang, Yuan, Liu & Fan, 2012) and adjacent fields of quantitative investigations of sustainable development (e.g. Fernandez-Feijoo, Romero & Ruiz, 2014; Hansmann, Mieg & Frischknecht, 2012; Wallis, 2006). Nonetheless, the methodological and empirical analyses have shown that the information-theoretic algorithm outperforms this multivariate statistical technique because both linear and higher order correlations are detected. Among the reviewed indices, the MLSDI is the only index that implements an information-theoretic algorithm (see above) and hence contributes a major methodological advancement to the index literature in general. Last, only three of the reviewed indices – the FEEM SI, SDGI, and the SSI – investigate sensitivities. The MLSDI improves their sensitivity analyses by intending to investigate three calculation steps instead of one or two steps only. However, testing sensitivities of missing value imputation becomes superfluous, given the Amelia II's failure (see Section 5.2.2).

#### **6.2 Implications for practice**

The present work provides several implications for corporate and political practices on sustainable development. This work encourages practitioners to always view sustainable development as one integrated crisis of environmental protection, social development, and economic prosperity (see Section 2.2.4; WSSD, 2002). The economic domain is hallmarked by the misconception that economic growth or profits are part of sustainable development. This work reminds practitioners to eliminate this misconception (see Section 2.2.3; e.g. Jackson, 2009; Vermeulen, 2018). The present study advises corporate practitioners to follow societal instrumental finality (see Section 2.3.2; e.g. T. Hahn & Figge, 2011) because not the long-term survival of the company (i.e. profits) is part of corporate sustainability, but corporations should contribute to the society level concept of sustainable development. In fact, their contributions are inevitable for achieving the SDGs (see Section 2.3.1; e.g. Dahl, 2012; Griggs et al., 2014). Furthermore, this work recommends politicians to abandon GDP (i.e. economic growth) as a measure of societal wellbeing (see Section 3.3.3; Costanza, Fioramonti & Kubiszewski, 2016) and replace it by the MLSDI, which alludes to progress comprehensively and soundly. However, political will might be lacking to let up on GDP (Jesinghaus, 2018).

This work further provides practitioners with an updated compilation of sustainable development assessment principles, which should be considered in any sustainable development assessment. The present study also delivers an updated overview of sustainable development methods. For practitioners, the provided overview by aggregational size might be easier to follow than, for example, overviews that are structured by the methodological approach (see Section 3.2; e.g. Sala et al., 2015). Practitioners might be unaware of the methods required for their problem setting, but they most likely know if they want to appraise, among others, a product, corporation, or a policy. The evaluation of sustainable development assessment methods by means of the assessment principles (see Section 3.2) entails two implications for practice. First, this work delivers an understanding of each method, and second, the present study encourages practitioners to implement sustainable development indicator sets that include a composite measure if they aim to comprehensively measure sustainable development performances by multilevel objects. Moreover, the evaluations of assessment principle compliances (see Section 3.3) and methodological approaches (see Section 4.2) of previous sustainable development indices result in two implications for practice. First, this work informs practitioners about existing alternatives of sustainable development indices. Second, the present study serves practitioners information about "do's" and "don'ts" in sustainable development index construction with regard to both the conceptual and the methodological phase. Concerning the methodology, this work discloses profound knowledge, such that the MLSDI can be re-built (see Section 6.1). The probably most important methodological aspect for corporations provided in this work might be the utilisation of GVA instead of revenues, sales, or profits (see Section 4.3.4). By means of the derived effectiveness and efficiency indicators, the present study supports practitioners to manage absolute and relative decoupling of sustainable development influences and economic activity, respectively. This is a major challenge for decision makers (see Section 3.2; Holden et al., 2014). Furthermore, this work promotes the implementation of paradox teleological integration to practitioners. All indicators should be followed at the same time, even if they are conflicting (see Section 2.3.2; e.g. T. Hahn & Figge, 2011). Moreover, this study delivers an advanced alignment of the GRI and the SDG frameworks at indicator level for the geographical region Germany. The indicator base is expected to be valid in further European countries. It further invites corporations that seek to report their performances on the macro SDGs to rely on this alignment. The provided alignment might be especially useful for corporations that are not able to allocate sufficient resources to report on the comprehensive option of the GRI framework but are not satisfied with the sparse core option. This study suggests collecting 36 key figures, a number that balances comprehensiveness and resources in practice. Further, this work encourages practitioners who are interested in data beyond the selected branches or Germany to take advantage of the benchmarking opportunities the MLSDI provides by enclosing detailed empirical analyses and data sources to re-produce the sample. Last, this work may support the action plan for financing green growth in the EU. First, the present study contributes to Action 1 of this plan, which encompasses the establishment of a unified classification system for sustainable activities, also termed "EU taxonomy" (EC, 2018). On the one hand, the derived conceptual framework (see Figure 2.11) may guide the establishment of the "shared understanding of what 'sustainable' means" (EC, 2018). On the other hand, the elaborated indicator set that is applicable to both the meso and the macro levels may support determining the environmental and the social objectives investors should aim for. Second and foremost, this work contributes to Action 5: developing sustainability benchmarks. More transparent and sounder methodologies of sustainable development indices are demanded in order to halt greenwashing (EC, 2018). The MLSDI and its well-researched and transparently exposed methodology (see Chapter 4) is capable to serve exactly this purpose.

#### **6.3 Limitations and future outlook**

Several limitations remain and may be investigated in future research. The social domain requires further conceptual development. The leading framework of the social boundaries (see Section 2.2.2; e.g. Raworth, 2017) mostly applies to needs of the developing, not the developed world. Because Maslow's hierarchy of needs (e.g. Maslow, 1987) covers needs of both developing and developed countries, an alignment of the social boundaries and Maslow's hierarchy of needs might be expedient (see Section 2.2.2). Further research on the concept of needs and possible harmonisations should be carried out. Similar to the concept of the planetary boundaries (see Section 2.2.1; e.g. Steffen et al., 2015), the finalised framework of social boundaries should be able to verify an indicator's relevance towards sustainable development (see Section 5.3.1.1 and Section 5.3.1.2).

The consideration of multiple levels sacrifices detailed analysis within one level.

In contrast to footprints, indicator sets typically report sustainable development performances of one object of investigation while disregarding upstream or downstream sustainable development performances. To deliver a holistic picture of the supply chain, the MLSDI should be combined with footprint analyses: A multilevel sustainable development footprint should be derived in future research. A combination of the multilevel index with single level life cycle assessment, a powerful tool to quantify a product's sustainable development performance, for example, from "cradle to grave" (see Section 3.2; Finnveden et al., 2009), might also spread interesting insights but could be methodologically challenging. Topics such as economic proximity (e.g. Torre & Zuindeau, 2009) are only reflected in the performance scores, and benefits that economic objects may experience through proximity cannot be analysed in detail. The literature review is limited by the definition of sustainable development indices, but indices that are not included in the review might provide valuable methodological insights. Further indices that apply information-theoretic weighting have been outlined in Section 6.1.

Moreover, the MLSDI's methodology is subject to several limitations. Adjustments of current prices of key figures reported in monetary units would increase methodological soundness (see Section 4.3.1) because nine years of calculation are covered, and efficiency indicators rely on both monetary and non-monetary units. An iterative algorithm on the single missing value imputation that matches the aggregated branches would refine the imputation results (see Section 4.3.3.2) and also enhance methodological soundness. The multiple missing value imputation by the Amelia II algorithm might not only fail because of the violation of the normality assumption, but because outliers are still present (see Section 4.3.3.3 and Section 5.2.2). An iterative algorithm over the calculation steps missing value imputation and outlier treatment could be tested. Only one micro index – the BLI (see Section 3.3.3; OECD, 2017) – has been identified in the literature, and the MLSDI's key indicator base is currently limited to the alignment of the meso GRI and the macro SDG frameworks (see Section 4.3.4). Further micro indices and a micro framework should be developed. Literature to verify the GRI and the SDG frameworks might unfold gaps and weaknesses in these reporting schemes. Conflicts might be present (Spaiser et al., 2017), and the frameworks' reflections of the planetary and the social boundaries (i.e. the safe and just operating space) could be investigated. Despite theoretical justifications, more sophisticated outlier detection and treatment methods could be explored in future studies because the conducted sensitivity analyses have revealed the importance of this calculation step. As the information-theoretic algorithm outperformed established multivariate statistical methods for weighting, information-theoretic outlier detection and treatment might be of interest. Further information can be found in, e.g. Aggarwal (2017).

Probably the major limitation of the MLSDI is the applied internal scaling (see Section 4.3.6.2). Targets and boundaries are excluded due to unavailable data. Results depend on the distribution, and their significance is reduced. For example, there will

still be well performing economic objects, if all objects feature a bad performance (Dahl, 2018). Therefore, the safe and just operating space must be converted into lower aggregational levels of corporations, industries, and nations expressed in terms of the SDGs (Dahl, 2018; Schmidt-Traub et al., 2017a; Steffen et al., 2015). Research on this breakdown only emerged recently and especially lacks the connection of the safe and just operating space and the SDGs. The probably most relevant study is released by O'Neill et al. (2018), who split up the planetary and the social boundaries into 150 nations. Linkage of the planetary boundaries and the SDGs is not available as a peer-reviewed contribution yet (Randers, Rockstr¨om & Stoknes, 2019), and literature regarding the nexus of the social boundaries and the SDGs could not be identified. Other adjacent studies, for example, design a framework for translating the planetary boundaries into fair shares at national levels (H¨ayh¨a, Lucas, van Vuuren, Cornell & Hoff, 2016), develop a methodology to assess a country's contribution to transgressing the planetary boundary phosphorus (M. Li, Wiedmann & Hadjikakou, 2019), or investigate whether growth has occurred within the planetary boundaries (i.e. genuine green growth) (Stoknes & Rockstr¨om, 2018). Studies that deal with linking corporate sustainability and the planetary boundaries include, e.g. Antonini and Larrinaga (2017); Dahlmann, Stubbs, Griggs and Morrell (2019); Haffar and Searcy (2018); and Whiteman et al. (2013). Nonetheless, to the best of the author's knowledge, the safe and just operating space has neither been disassembled to corporate nor to industry level yet. Consequently, targets and boundaries could not be included in the German sample (nor in any other geographical region). Methods and precise data generation at corporate, industry, and national levels of the planetary and the social boundaries constitute a major future field of research. The MLSDI connects to this new stream: Once the boundaries are broken down, these data can be fed in the MLSDI to precisely quantify a meso object's contribution to the macro SDGs. Moreover, the boundaries' scientific relationship must be known and hence explored in future research for accurate weighting (see Section 4.3.7; e.g. Ebert & Welsch, 2004; Steffen et al., 2015), making statistical weighting obsolete.

Furthermore, the three applied weighting methods (see Section 4.3.7) will never assign zero weights because indicators that are not perfectly correlated always add variation to the data set. The indicator selection and derivation process (see Section 4.3.4) cannot be reverted. Weighting across the contentual domains currently fails, and the sum of weights of one domain reflects the number of included key indicators. Subsequent adjustment is accomplished (see Section 4.3.7), but the MLSDI remains biased towards efficiency. More ratio than growth indicators are comprised without subsequent adjustments. Further research is required to develop methods that implicitly account for unbalanced numbers of indicators. The equal temporal weighting of the MRMRB algorithm is justified by the PTA's temporal weights (see Section 4.3.7.4 and Section 5.4.3). This procedure might be inaccurate as the PC family is generally outperformed. Structures of the temporal dimension could be investigated by information-theoretic applications in future studies. To strengthen the MRMRB algorithm's empirical results, sensitivities of discretisation methods could be tested. Despite successful punishment of bad performances by the geometric aggregation, the MLSDI is not capable of indicating urgency. This judgement remains with decision makers and is hence subjective. Sensitivity analyses could be advanced as OAT is generally criticised in the literature. More sophisticated methods are available (Saltelli & Annoni, 2010; Saltelli et al., 2008).

The current sample is limited to meso-level and macro-level applications because micro-level frameworks are not available. For a complete micro-to-macro connection, micro frameworks must be developed, and macro boundaries must be downscaled to lower aggregational levels (see above). To demonstrate the MLSDI's capability of implementing the multilevel perspective and highlighting the benchmarking opportunities across aggregational levels, an empirical application to meso objects (i.e. corporations) should be prospectively performed. Data sources are attached in the supplementary material to facilitate future applications. Generally, the change agent group society is underrepresented in the present sample. Business is involved by constituting the objects of investigation, policy is reflected by the SDG framework, and science is included by the investigation itself. Incorporating micro objects of investigation (i.e. individuals) would solve these two limitations simultaneously. Moreover, the present indicator selection exhibits several limitations. First, the inclusion of more indicators in the MLSDI is desirable to cover all multilevel aspects of the SDGs, but further data are missing for the German sample. Second, interpretability of existing indicators may be limited. For example, the environmental tax intensity, which is the ratio of environmental taxes and the GVA, rises if more environmental taxes are paid. On the one hand, the increase affects sustainable development positively because pollution is paid up for. On the other hand, more taxes are paid because more pollution is generated, harming sustainable development. Effectiveness as well as efficiency of a taxation system remains subject to further investigations (see Section 5.3.1.1). Regarding the social domain, the VAT's effective direction may also be questionable as the VAT is a non-progressive tax on an economic object's created value added. Financially well-placed economic objects are equally burdened in nominal terms as economic objects in weaker financial positions. The latter might suffer from financing social development. The key indicators on apprentices might be limited in their explanatory power. The number of university students may complete the picture on education, and data on labour market demands by educational level is required to draw reliable conclusions on the effective directions key indicators on education should carry. Indicators on trade also feature ambiguities. First, trade's effect on sustainable development may be ambiguous in general. Further information on the contribution of trade to the SDGs can be found in, e.g. WTO (2018). Second, Germany's net import intensity might not indicate support for developing countries. Products are mainly imported from the People's Republic of China, the Netherlands, France, United States of America, and Italy (descending order; Destatis, 2019a). Only China is an economy in transition, while the other countries of origin are developed countries (UN, 2019c). The poor to medium performances of the capital indicators entail uncertain interpretations. Classically, a decrease in capital indicators is interpreted negatively. However, in the digital era of big data and digitalisation, economic prosperity might be possible to be achieved despite decapitalisation and deinvestments – the IT industry stands out as the best performer (see Section 5.5.2).

Forward-looking scenarios as approached by, e.g. Carraro et al. (2013; see Section 3.3.3) should be explored to develop future pathways for comprehensive multilevel solutions by means of the MLSDI (see Section 2.2.4 and Section 2.3.4; e.g. Lang et al., 2012; Leach et al., 2013). A forecast of six SDG indicators can be found in Joshi, Hughes and Sisk (2015), and a review that provides assistance for national SDG scenario modelling can be found in Allen, Metternicht and Wiedmann (2017). More research is required in this field. The MLSDI's current selection of key indicators focuses on developed countries such as Germany. For instance, growth indicators of the economic domain are disregarded because Germany is one of the major economies of the world (see Section 5.3.1.3; UN, 2019c). However, the SDGs are universally applicable to all countries (see Section 2.3.3; e.g. Glaser, 2012), inviting multinational applications and country comparisons. In such applications, outlier thresholds, scales, and weights must be homogeneous. To evaluate the usefulness of national vs. multinational calculations, the MLSDI's sample should be enlarged to explore both scopes.

Effectiveness of performance measurement by the MLSDI is not investigated in the present work. Testa et al. (2018) find that greenwashing does not pay off (see Chapter 3). However, more case studies on the use of sustainable development indicators are required (Bell & Morse, 2018) to further evaluate the influence of sustainable development indicator sets that include a composite measure on sustainable development performance. Do such indicator systems only entail a bureaucratic burden, or do they trigger improved sustainable development performances? Research on the nexus of sustainable development indicators and sustainable development performances include, e.g. Bond and Morrison-Saunders (2013); Bond et al. (2015); and Ramos and Caeiro (2010), but further studies are needed. Additionally, future research should investigate whether indicators or other tools should be mandatory and rather standardised in view of effectiveness of measurement, supporting political decisions on reporting regulations. Last, the usefulness of comprehensive, multilevel indicators and indices for managerial and political decision making might be explored in future studies.

#### **6.4 Summary and conclusion**

In this dissertation, a methodological sound sustainable development index that is applicable to the micro, meso, and the macro levels has been developed. Multilevel assessment is crucial because the society level concept sustainable development can only be achieved if micro and meso objects contribute. Moreover, methodological soundness is a prerequisite for serving as a credible, valid, and reliable basis for decision making.

First, this work has elaborated a conceptual framework and assessment principles of sustainable development. Based on these, indicator sets that include a composite measure have been proven to be most successful in comprehensively quantifying multilevel sustainable development performances. A new index – the MLSDI – has been derived by linking the conceptual framework and the assessment principles to each index calculation step. The empirical analysis has confirmed the accuracy and robustness of the MLSDI's methodology. For improved sustainable development, environmental efficiency indicators on climate change and social effectiveness indicators on employment as well as the chemical industry's environmental performances and the agricultural industry's performances in all three contentual domains should be focused.

Manifold implications for research and practice follow from the conducted research. This work is the first to contribute a methodologically sound multilevel indicator set and a multilevel index (perspective gap) that address operational, strategic, and normative tiers (operational-to-normative gap). It is also the first to deploy an entropybased, information-theoretic algorithm to examine interactions of individual sustainable development elements (knowledge gap). This work provides unrestricted transparency for replicability (sustainability gap), and the MLSDI serves a wide scope of managerial and political decision-making purposes. An alignment of the meso GRI and the macro SDG frameworks at indicator level is delivered for corporate practice, and politicians are encouraged to replace GDP as a measure of wellbeing with the MLSDI.

In conclusion, the usefulness of the suggested approach for informed managerial and political decision making is expected to be high from both theoretical and methodological viewpoints but remains subject to further investigations at the micro, meso, and the macro levels to succeed in the long-term goal and vision of sustainability.

Open Access This chapter is licensed under the terms of the Creative Commons At tribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Appendix**

### **A.1 Statistical classification scheme of economic activities in the European Union (EU)**



61 Telecommunications


95 Repair of computers and personal and household goods

*continued*


**Table A.1** Sections and divisions in the German economy according to the Statistical Classification of Economic Activities in the European Community (NACE) (Eurostat, 2008b); †, omitted in the present calculation; n/a, not applicable

### **A.2 German health economy's statistical delimitation**




**Table A.2** German health economy's stakes in divisions at two-digit level in percentage from 2008 to 2016; see Table A.1 for denotation of section codes

### **A.3 Statistical tests of sustainable development key figures**


**Table A.3** Environmental key figures' test statistics and p-values of the Shapiro-Wilk (SW), Kolmogorov-Smirnov (KS), augmented Dickey-Fuller (aDF), and the Ljung-Box (LB) tests; \*\*, p-values <sup>≤</sup> <sup>0</sup>.01; \*\*\*\*, p-values <sup>≤</sup> <sup>0</sup>.<sup>0001</sup>


**Table A.4** Social key figures' test statistics and p-values of the Shapiro-Wilk (SW), Kolmogorov-Smirnov (KS), augmented Dickey-Fuller (aDF), and the Ljung-Box (LB) tests; \*\*, p-values <sup>≤</sup> <sup>0</sup>.01; \*\*\*\*, p-values <sup>≤</sup> <sup>0</sup>.0001; CIT, Corporate Income Tax; VAT, Value Added Tax



**Table A.5** Economic key figures' test statistics and p-values of the Shapiro-Wilk (SW), Kolmogorov-Smirnov (KS), augmented Dickey-Fuller (aDF), and the Ljung-Box (LB) tests; \*, p-values <sup>≤</sup> <sup>0</sup>.05; \*\*, p-values <sup>≤</sup> <sup>0</sup>.01; \*\*\*\*, p-values <sup>≤</sup> <sup>0</sup>.0001; GVA, Gross Value Added; R&D, Research and Development

### **A.4 Summary statistics of the sustainable development key indicators**



*continued*


**Table A.6** Summary statistics of the environmental key indicators in the German economy from 2008 to 2016; Max, Maximum; Min, Minimum; Q1, 25th percentile; Q3, 75th percentile



*continued*



**Table A.7** Summary statistics of the social key indicators in the German economy from 2008 to 2016; CIT, Corporate Income Tax; Max, Maximum; Min, Minimum; p.c., per capita; p.h., per hour; Q1, 25th percentile; Q3, 75th percentile; VAT, Value Added Tax





**Table A.8** Summary statistics of the economic key indicators in the German economy from 2008 to 2016; GVA, Gross Value Added; Max, Maximum; Min, Minimum; p.c., per capita; p.h., per hour; Q1, 25th percentile; Q3, 75th percentile; R&D, Research and Development

### **A.5 Outlier thresholds of the sustainable development key indicators**


**Table A.9** Environmental key indicators' upper and lower outlier thresholds; †, theoretical threshold (domain ≥ 0)



**Table A.10** Social key indicators' upper and lower outlier thresholds; †, theoretical threshold (domain ≥ 0); CIT, Corporate Income Tax; p.c., per capita; p.h., per hour; VAT, Value Added Tax



**Table A.11** Economic key indicators' upper and lower outlier thresholds; †, theoretical threshold (domain ≥ 0); GVA, Gross Value Added; p.c., per capita; p.h., per hour; R&D, Research and Development

### **A.6 Normality tests of z-score scaled sustainable development key indicators**


**Table A.12** Z-score scaled environmental key indicators' average test statistics and p-values of the Shapiro-Wilk (SW) and the Kolmogorov-Smirnov (KS) tests from 2008 to 2016



**Table A.13** Z-score scaled social key indicators' average test statistics and p-values of the Shapiro-Wilk (SW) and the Kolmogorov-Smirnov (KS) tests from 2008 to 2016; CIT, Corporate Income Tax; p.c., per capita; p.h., per hour; VAT, Value Added Tax



**Table A.14** Z-score scaled economic key indicators' average test statistics and p-values of the Shapiro-Wilk (SW) and the Kolmogorov-Smirnov (KS) tests from 2008 to 2016; GVA, Gross Value Added; p.c., per capita; p.h., per hour; R&D, Research and Development

**Figure A.1** Frequency distribution of z-score scaled average compensation of employees per capita (p.c.) and consumed capital productivity in the German economy in 2016







**Figure A.2** Frequency distribution by the four composite measures and the three outlier detection methods in rescaled performance scores in the German economy in 2016; α, outlier coefficient; MLSDI, Multilevel Sustainable Development Index







Principal Component

 Analysis; PTA, Partial Triadic Analysis

**Figure A.3** Frequency distribution of the four composite measures by the three weighting methods in rescaled performance scores in the German economy in 2016; MLSDI, Multilevel Sustainable Development Index; MRMRB, Maximum Relevance Minimum Redundancy Backward algorithm; PCA, Principal Component Analysis; PTA, Partial Triadic Analysis

### **References**


© The Author(s) 2021 C. Lemke, *Accounting and Statistical Analyses for Sustainable Development*, Sustainable Management, Wertschöpfung und Effizienz, https://doi.org/10.1007/978-3-658-33246-4


Latouche, S. (2009). Farewell to growth. Cambridge: Polity Press.


Australian conference on artificial intelligence (Chap. 37, pp. 440–452). Berlin: Springer.

